Skip to content

Project consists to analyse a publicly available movie dataset found in https://www.kaggle.com/beyjin/movies-1990-to-2017 and use Python tools like Pandas in order to get some initial insights about the dataset and finally proceeding to clean, transform and save a new version of the dataset in a better structure thinking about storing the data i…

Notifications You must be signed in to change notification settings

Fantaso/data-analysis-and-manipulation-with-pandas

Repository files navigation

Data Analysis & Manipulation With Pandas

Data analysis with Pandas, Numpy, Matplotlib & Seaborn.

Project consists to analyse a publicly available movie dataset found in https://www.kaggle.com/beyjin/movies-1990-to-2017 and use Python tools like Pandas in order to get some initial insights about the dataset and finally proceeding to clean, transform and save a new version of the dataset in a better structure thinking about storing the data in a database.




Index:




Introduction

There are 3 files which you can look in this exact order

  1. initial_insights.ipynb

    Taking a first look to the raw datasets and finding insights that help us understand the data we will be processing and also to get an overview on how we should structure the datasets as if we where going to store the data into a database

    Note: insights and conclusions can be found in the jupyter file

  2. clean_datasets.ipynb

    We go here through the whole process standardizing the data types, extracting columns that should go in a different dataset and saving the and cleaned datasets.

    Note: Target Database Schema database-schema

  3. cleaned_datasets_grouped.ipynb

    Here we take the cleaned datasets and we just join them all together into a big and only one dataset

  4. Raw & Cleaned Datasets

    • The original datasets (raw) are located in the folder orignal_datasets/
    • The output generated datasets (cleaned) will be located in the folder output/

Information:

Technology Stack
Python language Language
Pandas data-analysis Data Analysis & Manipulation
Numpy data-computing Data Computing
Matplotlib data-visualization Data Visualization
Seaborn data-visualization Data Visualization



Maintainer

Get in touch -–> fantaso

About

Project consists to analyse a publicly available movie dataset found in https://www.kaggle.com/beyjin/movies-1990-to-2017 and use Python tools like Pandas in order to get some initial insights about the dataset and finally proceeding to clean, transform and save a new version of the dataset in a better structure thinking about storing the data i…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published