Extract, Transform, Load

Team Members

Dianne Jardinez, Aastha Arora, Swarna Latha

Project Summary

The objective of this project was to extract data from websites and available APIs. The following datasets were then transformed by cleaning, joining, and filtering into nine tables. The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin.

Finding Data

The following Data Sources were used below:

IMDb Website
- Method: Webscraping
- Used for: Collecting the Top 250 IMDB rated movie list
OMDb API
- Method: API Extraction
- Used for: Collecting IMDb id and other movie related details like actor, director, etc.
Utelly API
- Method: API Extraction
- Used For: Collecting streaming options for Top 250 IMDb movies
uNoGS API
- Method: API Extraction
- Used For: Collecting movies on Netflix in released in the United States which have an IMDb rating between 7 and 10
Google Search Engine
- Method: Webscraping
- Used for: Collecting viewing Streaming Service availability and price

Data Cleanup & Analysis

Data extracted were formated in CSV and JSON files
The following datasets were then transformed by cleaning, joining, and filtering into nine tables
The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin. A relational database was selected as the data was in a structured format

Project Report

Extract:
- Google scraping.ipynb:
  - contains IMDB website and Google Search Engine Webscraping
- netflix_high_imdb_rated(uNoGS api).ipynb:
  - contains IMDB website Webscraping, OMDb API, and uNoGS API extraction
- streaming_options(utelly api).ipynb:
  - contains Utelly API extraction
Transform:
- Transform.ipynb:
  - contains all datasets that were transformed into nine tables
Load:
- SQL folder:
  - contains ERD and schema
- SQL_Table folder:
  - contains the creation of and all nine tables created in pgAdmin with PostgreSQL
- Project Report document:
  - contains detailed project description and sample PostgreSQL queries in pgAdmin

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Misc Code		Misc Code
Output		Output
SQL		SQL
SQL_Table		SQL_Table
.gitignore		.gitignore
Google scraping.ipynb		Google scraping.ipynb
PostgreSQL_Tables.ipynb		PostgreSQL_Tables.ipynb
Project Report.pdf		Project Report.pdf
README.md		README.md
Transform.ipynb		Transform.ipynb
config.py		config.py
config_omdb.py		config_omdb.py
config_sql.py		config_sql.py
netflix_high_imdb_rated(uNoGS api).ipynb		netflix_high_imdb_rated(uNoGS api).ipynb
streaming_options(utelly api).ipynb		streaming_options(utelly api).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract, Transform, Load

Team Members

Project Summary

Finding Data

Data Cleanup & Analysis

Project Report

About

Releases

Packages

Languages

Aastha-Arora/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Extract, Transform, Load

Team Members

Project Summary

Finding Data

Data Cleanup & Analysis

Project Report

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages