Crowdfunding_ETL

This project aims to build an ETL pipeline using Python, Pandas, and Python dictionary methods or regular expressions to extract and transform data. Transformed data was used to create four CSV files, an ERD, and a table schema. Successfully uploaded CSV file data into a Postgres databases.

Tools Used

Python
Pandas
Numpy
Postgres Database

Extract the crowdfunding.xlsx Data

Read the data into a Pandas DataFrame

crowdfunding_info_df = pd.read_excel('Resources/crowdfunding.xlsx') crowdfunding_info_df.head()

Create the Category and Subcategory DataFrames Create a Category DataFrame that has the following columns:

A "category_id" column that is numbered sequential form 1 to the length of the number of unique categories. A "category" column that has only the categories. Export the DataFrame as a category.csv CSV file.

Create a SubCategory DataFrame that has the following columns:

A "subcategory_id" column that is numbered sequential form 1 to the length of the number of unique subcategories. A "subcategory" column that has only the subcategories. Export the DataFrame as a subcategory.csv CSV file.

Get the crowdfunding_info_df columns.

crowdfunding_info_df.columns Index(['cf_id', 'contact_id', 'company_name', 'blurb', 'goal', 'pledged', 'outcome', 'backers_count', 'country', 'currency', 'launched_at', 'deadline', 'staff_pick', 'spotlight', 'category & sub-category'], dtype='object')

Assign the category and subcategory values to category and subcategory columns.

crowdfunding_info_df[['category','subcategory']] = crowdfunding_info_df['category & sub-category'].str.extract('(.+)/(.+)',expand=True) crowdfunding_info_df.head()

Get the unique categories and subcategories in separate lists.

categories = crowdfunding_info_df['category'].unique() subcategories = crowdfunding_info_df['subcategory'].unique() print(categories) print(subcategories)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
.DS_Store		.DS_Store
Crowdfunding_ERD.png		Crowdfunding_ERD.png
ETL_Mini_Project_FSyed.ipynb		ETL_Mini_Project_FSyed.ipynb
ETL_Mini_Project_Starter_Code.ipynb		ETL_Mini_Project_Starter_Code.ipynb
README.md		README.md
crowdfunding_db.sql		crowdfunding_db.sql
crowdfunding_db_schema.sql		crowdfunding_db_schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdfunding_ETL

Tools Used

Read the data into a Pandas DataFrame

Get the crowdfunding_info_df columns.

Assign the category and subcategory values to category and subcategory columns.

Get the unique categories and subcategories in separate lists.

Get the number of distinct values in the categories and subcategories lists.

Create numpy arrays from 1-9 for the categories and 1-24 for the subcategories.

About

Releases

Packages

Languages

SyedFarman/Crowdfunding_ETL

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding_ETL

Tools Used

Read the data into a Pandas DataFrame

Get the crowdfunding_info_df columns.

Assign the category and subcategory values to category and subcategory columns.

Get the unique categories and subcategories in separate lists.

Get the number of distinct values in the categories and subcategories lists.

Create numpy arrays from 1-9 for the categories and 1-24 for the subcategories.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages