This project builds a cloud-based pipeline to extract NYC taxi data from an API and store it in Azure Data Lake Storage (ADLS). Databricks and PySpark are used to transform the data through the medallion architecture (Bronze → Silver → Gold). Delta Lake ensures reliable storage, and Power BI provides visual insights for data-driven decision-making.
-
Updated
Dec 3, 2024 - Jupyter Notebook