Speed-Dash Data Pipeline

🏗️ Architecture Diagram

This project demonstrates an automated ETL (Extract, Transform, Load) pipeline using AWS services to process daily delivery data for a hypothetical company called SpeedDash. The goal is to provide a simple, scalable, and efficient solution for real-time data processing.

🌟 Project Overview

The Speed-Dash Data Pipeline is a serverless data processing solution that leverages AWS services to automate the extraction, transformation, and loading of daily delivery data. When a JSON file containing delivery records is uploaded to a designated S3 bucket, an AWS Lambda function is triggered. The function filters the records based on delivery status and stores the processed data in a different S3 bucket. Additionally, Amazon SNS is used to send notifications regarding the status of data processing, ensuring stakeholders are informed in real-time.

🚀 Motivation

While Data Engineering projects can often seem complex and daunting, this project showcases a straightforward approach to building a functional data pipeline. By leveraging managed AWS services, the focus remains on solving the problem rather than dealing with infrastructure management. The project is inspired by similar real-world scenarios and is designed to be both educational and practical.

🛠️ AWS Services Used

Amazon S3: Used for storing both raw input files and processed output files. It serves as the data lake for this project.
Learn More About Amazon S3
AWS Lambda: A serverless compute service that runs the data processing logic whenever a new file is uploaded to the S3 bucket. It is configured to filter records based on delivery status and save the filtered data to another S3 bucket.
Learn More About AWS Lambda
Amazon SNS (Simple Notification Service): Used to send notifications about the processing status to subscribed users via email.
Learn More About Amazon SNS
AWS CodeBuild: A fully managed build service used to automate the deployment process of the Lambda function and its dependencies from a GitHub repository.
Learn More About AWS CodeBuild

📝 Requirements

AWS Account
Amazon S3 Buckets: speed-dash-landing-zone (for raw files) and speed-dash-target-zone (for processed files)
AWS Lambda
Amazon SNS
AWS IAM Roles (for managing permissions)
AWS CodeBuild (for CI/CD)
GitHub (for version control)
Python, including the pandas library
Email Subscription for SNS notifications

🔄 Steps to Implement the Pipeline

Set Up S3 Buckets:
- Create two S3 buckets: speed-dash-landing-zone (for incoming raw JSON files) and speed-dash-target-zone (for processed files).
Create Sample JSON File:
- Prepare a sample JSON file (e.g., 2024-03-09-raw_input.json) containing delivery records with various statuses (e.g., "cancelled," "delivered," "order placed").
- Upload daily JSON files to the speed-dash-landing-zone bucket in the format yyyy-mm-dd-raw_input.json.
Set Up Amazon SNS Topic:
- Create an SNS topic to send notifications about the processing status.
- Subscribe an email address to the topic to receive notifications.
Create IAM Role for Lambda:
- Define an IAM role with the necessary permissions to read from speed-dash-landing-zone, write to speed-dash-target-zone, and publish messages to the SNS topic.
Create and Configure AWS Lambda Function:
- Develop a Lambda function using Python. Include the pandas library by either packaging it with the function code or using a Lambda Layer.
- Configure the function to trigger when files are uploaded to speed-dash-landing-zone. The function should:
  - Read the JSON file into a pandas DataFrame.
  - Filter records where status is "delivered."
  - Write the filtered data to a new JSON file in speed-dash-target-zone.
  - Send a success or failure notification via SNS.
Set Up AWS CodeBuild for CI/CD:
- Host the Lambda function code on GitHub.
- Create an AWS CodeBuild project linked to the GitHub repository.
- Use the buildspec.yml file to automate the deployment of Lambda function code updates.
Testing and Verification:
- Upload a sample JSON file to speed-dash-landing-zone and ensure the Lambda function is triggered automatically.
- Verify that the processed file is correctly stored in speed-dash-target-zone.
- Check for email notifications to confirm the processing status.

📂 Folder Structure

Project Screenshots: Contains snapshots of the architecture diagram and other relevant screenshots.
scripts: Contains scripts used for AWS Lambda functions and other processing logic.
buildspec.yml: Configuration file for AWS CodeBuild to automate deployment.
requirements.txt: Contains Python dependencies for the Lambda function, such as pandas.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Project Screenshots		Project Screenshots
2024-03-09-raw_input.json		2024-03-09-raw_input.json
README.md		README.md
Untitled Diagram.drawio		Untitled Diagram.drawio
buildspec.yml		buildspec.yml
lambda_function.py		lambda_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speed-Dash Data Pipeline

🏗️ Architecture Diagram

🌟 Project Overview

🚀 Motivation

🛠️ AWS Services Used

📝 Requirements

🔄 Steps to Implement the Pipeline

📂 Folder Structure

About

Releases

Packages

Languages

desininja/Speed-Dash-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Speed-Dash Data Pipeline

🏗️ Architecture Diagram

🌟 Project Overview

🚀 Motivation

🛠️ AWS Services Used

📝 Requirements

🔄 Steps to Implement the Pipeline

📂 Folder Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages