Contributors:
- Vikas Omer | Amazon Web Services | Linkedin
- Aneesh Chandra PN | Amazon Web Services | Linkedin
- Chatchai Komrangded | Amazon Web Services | Linkedin
- Design serverless data lake architecture
- Build a data processing pipeline and Data Lake using Amazon S3 for storing data
- Use Amazon Kinesis for real-time streaming data
- Use AWS Glue to automatically catalog datasets
- Data Transformation
- Run interactive ETL scripts in an Amazon SageMaker Jupyter notebook connected to an AWS Glue development endpoint
- Use Glue Studio to run, and monitor ETL jobs in AWS Glue.
- Use Glue DataBrew to do data preparation
- Use EMR to run a Spark transformation job
- Load data to Amazon Redshift from Glue
- Intro into Amazon Redshift Best design practices.
- Query data using Amazon Athena & visualize it using Amazon QuickSight
- You need to have access to an AWS account with AdminstratorAccess
- This lab should be executed in us-east-1 region
- Best is to follow links from this guide & open them in new a tab
- Run this lab in a modern browser
Module | Link |
---|---|
Ingest and Store | link |
Catalog Data | link |
Transform Data with AWS Glue | link |
Transform Data with AWS Glue Studio | link |
Transform Data with AWS Glue DataBrew | link |
Transform Data with EMR | link |
Analyze with Athena | link |
Visualize with Quicksight | link |
Lambda | link |
Redshift | link |
Cleanup | link |
Please do check on the pre-requisites for each module before starting the activities within the module.
Also, do not forget to clean up the resources at the end of the workshop!