Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.2 KB

File metadata and controls

24 lines (17 loc) · 1.2 KB

Real-Time-Stock-Market-Data-Pipeline-with-Kafka

This project involves building a comprehensive, real-time data engineering pipeline focused on processing stock market data using Apache Kafka. The pipeline integrates various tools and technologies to efficiently handle streaming data and perform operations relevant to data engineering.

Technologies Employed

  • Programming Language: Python
  • Amazon Web Services (AWS):
    • S3 (Simple Storage Service)
    • Athena
    • Glue Crawler
    • Glue Catalog
    • EC2
  • Apache Kafka for real-time data streaming
  • SQL for querying data and analysis

Architecture

Project Architecture

This project architecture leverages Kafka for real-time data ingestion and various AWS services for data storage, cataloging, and querying. It is designed to illustrate a typical data engineering workflow for managing large-scale, streaming data.

Dataset

The project is adaptable to different datasets, emphasizing the operational aspects of building and managing the data pipeline. Dataset is available in the files section