Created a Data Warehouse of COVID-19 data on Cases & Deaths, Hospital Admissions and more, develop a complete Data Pipeline using Azure Data Factory & Databricks. Data Visualization was made using PowerBi.
-
Cloned the project repository from GitHub .
-
Above line can be skipped by fetching data from ECDC API.
-
Developed a Data Pipeline in Azure Data Factory
◾ Fetched data from GitHub to Azure Blob Storage.
◾ Processed data by applying diverse transformations as per requirements using:
▪ Used Dataflows in Data Factory
▪ Pyspark in Azure Databricks to write data in Azure SQL DB.
-
Created Data Lake to store raw and processed data.
-
Developed a Data Warehouse in Azure SQL DB(DDL Command & Pyspark_code_in_SQL) and masked the sensitive data using Pyspark functionality(Pyspark code)
-
To get insights out of it, data from SQL DB was loaded into Power BI Desktop.
◽ Azure Data Factory (Dataflows, Linked Services, Triggers, Azure Databricks)
◽ Azure Blob Storage
◽ Azure Data Lake Storage Gen 2
◽ Azure SQL DB