Skip to content

ETL pipeline to build a campaign crowdfunding database from Excel. Used Python, Pandas, and PostgreSQL.

Notifications You must be signed in to change notification settings

zonaj2/ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 2: ETL Pipeline

Table of Contents

About

In this project, I implemented an ETL (Extract, Transform, Load) pipeline using Python, Pandas, and PostgreSQL. The goal was to import data from an Excel file, transform it to CSV files, and load the CSV files into a PostgreSQL database.

Key Steps:

  1. Created Category and Subcategory DataFrames
    • Imported data from crowdfunding.xlsx using pd.read_excel.
    • Created a category DataFrame with "category_id" and "category" columns using str.split and list comprehension.
    • Created a subcategory DataFrame with "subcategory_id" and "subcategory" columns using str.split and list comprehension.

  1. Created the Campaign DataFrame

    • Renamed and dropped unnecessary columns.
    • Changed data types (e.g., string to float, datetime).
    • Performed a SQL left join on category and subcategory tables to create a merged dataframe.
  2. Created the Contacts DataFrame

    • Transformed data from contacts.xlsx into a contacts dataframe using Python dictionary methods.
    • Split the "name" column into first and last names.
    • Cleaned and exported the DataFrame as contacts.csv.
  3. Created the Crowdfunding Database

    • Designed the database schema based on the ERD generated using Quick DBD.

    Database Schema

    • Created tables with appropriate data types, primary keys, and foreign keys.
    • Imported data from CSV files into the PostgreSQL database.
    • Verified data integrity by running SELECT statements.

Resources

Database Tables

Category Table


Subcategory Table


Contacts Table


Campaign Table

About

ETL pipeline to build a campaign crowdfunding database from Excel. Used Python, Pandas, and PostgreSQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published