NYC-payroll-project

Project Introduction

The City of New York would like to develop a Data Analytics platform on Azure Synapse Analytics to accomplish two primary objectives:

Analyze how the City's financial resources are allocated and how much of the City's budget is being devoted to overtime.

Make the data available to the interested public to show how the City’s budget is being spent on salary and overtime pay for all municipal employees.

The main goals are to create high-quality data pipelines that are dynamic, can be automated, and monitored for efficient operation. The project team also includes the city’s quality assurance experts who will test the pipelines to find any errors and improve overall data quality.

The source data resides in Azure Data Lake and needs to be processed in a NYC data warehouse in Azure Synapse Analytics. The source datasets consist of CSV files with Employee master data and monthly payroll data entered by various City agencies.

Project Environment

For this project, I worked in the Azure Portal, using several Azure resources, including:

Azure Data Lake Gen2

Azure SQL DB

Azure Data Factory

Azure Synapse Analytics

Project steps

The project was divided into 6 steps to organize and allow the correct management.

Step 1: Prepare the Data Infrastructure

The data infrastructure involves the creation of the following resources:

Azure Data Lake Storage Gen2 (storage account) and associated storage container resource to upload the raw data.
Azure Data Factory Resource.
SQL Database and the table to store the current year data.
Synapse Analytics workspace and the master data tables.

Step 2: Create Linked Services

In the Azure Data Factory, three Linked Services were created:

To Azure Data Lake
To SQL Database
To Synapse Analytics

Step 3: Create Datasets in Azure Data Factory

In the Azure Data Factory, were created the Datasets to load the raw data and to save the transformed data.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Screenshots		Screenshots
dataflow		dataflow
dataset		dataset
factory		factory
linkedService		linkedService
pipeline		pipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC-payroll-project

Project Introduction

Project Environment

Project steps

Step 1: Prepare the Data Infrastructure

Step 2: Create Linked Services

Step 3: Create Datasets in Azure Data Factory

Step 4: Create Data Flows

Step 5: Data Aggregation and Parameterization

Step 6: Github connection

About

Releases

Packages

egoliveira1/NYC-payroll-project

Folders and files

Latest commit

History

Repository files navigation

NYC-payroll-project

Project Introduction

Project Environment

Project steps

Step 1: Prepare the Data Infrastructure

Step 2: Create Linked Services

Step 3: Create Datasets in Azure Data Factory

Step 4: Create Data Flows

Step 5: Data Aggregation and Parameterization

Step 6: Github connection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages