Skip to content

This repository contains the NYC Taxi Data Engineering Pipeline project, which aims to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. The pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation.

Notifications You must be signed in to change notification settings

nafisalawalidris/NYC_Taxi_Data_Pipeline

Repository files navigation

NYC Taxi Data Engineering Pipeline (Yellow Taxi Trip Records)

This project outlines the steps needed to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. This pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation. The goal is to consolidate, clean, transform and store large volumes of taxi trip data in a Snowflake database and create a dashboard for visualising insights from the data.

If you find this project useful, kindly consider giving it a star ⭐ on GitHub.

alt text

Project Setup

  1. Clone the Repository:

    git clone https://github.com/nafisalawalidris/NYC_Taxi_Data_Pipeline.git
    cd NYC_Taxi_Data_Pipeline
  2. Create and Activate a Virtual Environment:

    python -m venv nyc_taxi_env
    .\nyc_taxi_env\Scripts\Activate  # On Windows
    source nyc_taxi_env/bin/activate  # On macOS/Linux
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Run the Scripts:

    • Follow the instructions in the scripts to extract, transform and load the data.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Contact

For any inquiries or suggestions, please contact Nafisa Lawal Idris.

About

This repository contains the NYC Taxi Data Engineering Pipeline project, which aims to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. The pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published