Zenler Player
Your course is loading. Hang tight.
AWS Data Engineering Project Labs
Back to curriculum
0% Complete
0% Complete
Comment - Your effort can change your career trajectory!
AWS Set up
1 Intro - AWS Setup
2 Create AWS Account
3 Login to AWS using Root User and go to IAM
4 Create Admin User with Console and Programmatic Access
5 Download and Install AWS CLI
6 Create Access Key for AWS CLI Access
7 Configure AWS CLI on your system
8 VIMP - Set up three AWS Budgets
8.5 Set Default Region to us-east-1
9 Comment needed - Outtro - AWS Set Up
Set up for Data Analysis
1a. Intro - Set up for Analysis
1b. Datalake Setup
2. Download Data from NYC TLC
3. Move the downloaded file to your project folder
4. Create S3 bucket in Datalake & upload data to the datalake S3 bucket
5. Comment Needed - Hands On Starts
6. Create Glue Catalog Database
7. Create a crawler to crawl the data and run the crawler
8. View the Parquet Data in Athena
9. Comment Needed Outro
1. Comment Below - Mandatory detour to Course RADE™ Agentic Data Engineering with Amazon Q
2. Before the Hands on Continues - VIMP!
3. Create the Glue Ingestion Job
4. Run Glue Crawler again to update the Table Metadata with partitions
5. Create Crawler for Zone Table
6. Comment Below - Next Step is Very Important
Before the Development Begins
:- 1. Intro - Understanding and Gathering Requirements
2. :- Comment -Capstone Requirements and Guidance on completing it - 3 options
3. Analysis by Data Analysts, Meeting with them and Proposing a 3 layer structure
4. Understanding Requirements using the Mapping Sheet
5. Comment Decide on Technical Requirements & gear up for development
1 Open the notebook file in your VS Code
2 Start Iterative Development - Load the data
3 Comment - Transform, Validate and Load to Curated
4 Create and Run Glue Script Job
5 Comment - Why did the script run longer?
0 Comment - What will you learn in this section
1 Time to unit test before we hand over the data to Data analysts
2 Unit test the curated layer and fill the unit test document and then hand over to the data anaysts
3 Analyse the Curated to Aggregated Mapping Sheet
4 Create Glue Notebook , load the curated data and enrich with Zone information
5 Create the aggregated tables and load the aggregated data into S3
6 Get the script from the notebook and stop the notebook
7 Get the script vetted by AI
8 Create and Run the Glue ETL job to load from curated to Aggregated
9 What-s the next step
10 Create aggregated tables with the help of Crawler
11 Unit test the aggregated data, provide access to Data analysts and let them to know
12 Comment - What have you learnt so far
1 Historical Vs Monthly Cadence
2 Comment - Code for both historical and current - run the job
3 Why do we even need to think of incremental processing of Data
4 Extremely Important from Interview Perspective - Enable Job bookmarks for incremental processing
4.1 Glue Dynamic Frame vs Spark Dataframe
5 Comment Incremental Processing from Curated to Aggregated
6 Solution to previous ask
1. Background for Orchestration
2. Create SNS Topic and Subscribe your email
3. Get the Step Function Code
4. Build the sequential pipeline using Step Function and run it
1. Eventbridge Schedule - One Time Test
2. Set up the Monthly Cadence!
What Next?
0 Get Ready for the Roller Coaster Ride
Comment - Your effort can change your career trajectory!
Preview
1 Create AWS Account & Set Up AWS CLI
AWS Set up
1 Intro - AWS Setup
2 Create AWS Account
3 Login to AWS using Root User and go to IAM
4 Create Admin User with Console and Programmatic Access
5 Download and Install AWS CLI
6 Create Access Key for AWS CLI Access
7 Configure AWS CLI on your system
8 VIMP - Set up three AWS Budgets
8.5 Set Default Region to us-east-1
9 Comment needed - Outtro - AWS Set Up
2 Set-Up For Data Analysis
Set up for Data Analysis
1a. Intro - Set up for Analysis
1b. Datalake Setup
2. Download Data from NYC TLC
3. Move the downloaded file to your project folder
4. Create S3 bucket in Datalake & upload data to the datalake S3 bucket
5. Comment Needed - Hands On Starts
6. Create Glue Catalog Database
7. Create a crawler to crawl the data and run the crawler
8. View the Parquet Data in Athena
9. Comment Needed Outro
3 Build an Ingestion ( Extract of ETL ) job
1. Comment Below - Mandatory detour to Course RADE™ Agentic Data Engineering with Amazon Q
2. Before the Hands on Continues - VIMP!
3. Create the Glue Ingestion Job
4. Run Glue Crawler again to update the Table Metadata with partitions
5. Create Crawler for Zone Table
6. Comment Below - Next Step is Very Important
4 Understanding and Gathering Requirements
Before the Development Begins
:- 1. Intro - Understanding and Gathering Requirements
2. :- Comment -Capstone Requirements and Guidance on completing it - 3 options
3. Analysis by Data Analysts, Meeting with them and Proposing a 3 layer structure
4. Understanding Requirements using the Mapping Sheet
5. Comment Decide on Technical Requirements & gear up for development
5 Build the data transformation job - Raw to Curated
1 Open the notebook file in your VS Code
2 Start Iterative Development - Load the data
3 Comment - Transform, Validate and Load to Curated
4 Create and Run Glue Script Job
5 Comment - Why did the script run longer?
6 Unit Testing, analysis, and further development
0 Comment - What will you learn in this section
1 Time to unit test before we hand over the data to Data analysts
2 Unit test the curated layer and fill the unit test document and then hand over to the data anaysts
3 Analyse the Curated to Aggregated Mapping Sheet
4 Create Glue Notebook , load the curated data and enrich with Zone information
5 Create the aggregated tables and load the aggregated data into S3
6 Get the script from the notebook and stop the notebook
7 Get the script vetted by AI
8 Create and Run the Glue ETL job to load from curated to Aggregated
9 What-s the next step
10 Create aggregated tables with the help of Crawler
11 Unit test the aggregated data, provide access to Data analysts and let them to know
12 Comment - What have you learnt so far
7 Very Important Section - Historical & Incremental Loading
1 Historical Vs Monthly Cadence
2 Comment - Code for both historical and current - run the job
3 Why do we even need to think of incremental processing of Data
4 Extremely Important from Interview Perspective - Enable Job bookmarks for incremental processing
4.1 Glue Dynamic Frame vs Spark Dataframe
5 Comment Incremental Processing from Curated to Aggregated
6 Solution to previous ask
8 Orchestration - Build the Pipeline
1. Background for Orchestration
2. Create SNS Topic and Subscribe your email
3. Get the Step Function Code
4. Build the sequential pipeline using Step Function and run it
9 Schedule the job - monthly cadence
1. Eventbridge Schedule - One Time Test
2. Set up the Monthly Cadence!
10 Where do you go Next?
What Next?
×
This is an unpublished lesson. This lesson will not be shown for students unless you set it as Public.
Back to Dashboard
No contents are available in this lesson!
No lessons available !
Back to Dashboard
Lesson contents locked
Enroll to unlock this lesson.
Enroll to unlock
Next Lesson