The purpose of this project is to master the exploratory data analysis (EDA) with different datasets.
- Explore datasets with Pandas framework.
- Visualize the dataset with various plot types.
The dataset is from Kaggle: IMDB data from 2006 to 2016 and contains information about 1,000 movies collected from The Movie Database (IMDb), including rating, revenue, year, runtime and genres. In this analysis, we set out to analyze IMDB movie dataset to get insights and answer all our burning curiousity. I tried to give answers to a set of questions that may be relevant when analyzing movie data.
- Display Top 10 Rows of The Dataset
- Check Last 10 Rows of The Dataset
- Find Shape of Our Dataset (Number of Rows And Number of Columns)
- Getting Information About Our Dataset Like Total Number Rows, Total Number of Columns, Datatypes of Each Column And Memory Requirement
- Check Missing Values In The Dataset
- Drop All The Missing Values
- Check For Duplicate Data
- Get Overall Statistics About The DataFrame
- Display Title of The Movie Having Runtime Greater Than or equal to 180 Minutes
- In Which Year There Was The Highest Average Voting?
- In Which Year There Was The Highest Average Revenue?
- Find The Average Rating For Each Director
- Display Top 10 Lengthy Movies Title and Runtime
- Display Number of Movies Per Year
- Find Most Popular Movie Title (Highest Revenue)
- Display Top 10 Highest Rated Movie Titles And its Directors
- Display Top 10 Highest Revenue Movie Titles
- Find Average Rating of Movies Year Wise
- Does Rating Affect The Revenue?
- Classify Movies Based on Ratings [Excellent, Good, and Average]
- Count Number of Action Movies
- Find Unique Values From Genre
- How Many Films of Each Genre Were Made?
- Pandas
- Matplotlib
- Seaborn
© 2021 Subala Singh