Skip to content

Latest commit

 

History

History
47 lines (35 loc) · 2.91 KB

File metadata and controls

47 lines (35 loc) · 2.91 KB

IMDb-Data-Analysis-Exercise-Part-1

IMDb Image

Dove deep into IMDb data using Python and visualization tools, unveiling title release patterns and viewer predilections. Applied regression models to predict title ratings, and set the groundwork for building recommender systems for TV shows/movies or revenue prediction models using IMDb data.

Data Source

The primary data source for this analysis is IMDb, an extensive online database that provides detailed information about films, TV series, podcasts, video games, and other media content.

Analysis

  • Initiated the exploratory analysis by identifying the time span of the dataset and categorizing titles by type and genre.
  • Visualized the number of titles released each year, identifying predominant title types like TV episodes, movies, and short films.
  • Explored viewer preferences, determining genres like Drama, Comedy, and Documentary as the most popular.
  • Analyzed title runtime trends over the years, highlighting shifts in movie and TV episode durations.

Libraries Used

The analysis utilizes the following Python libraries and packages:

  • Seaborn: For enhanced data visualization.
  • Sklearn: For machine learning and data preprocessing (mean_squared_error, LinearRegression, PolynomialFeatures, RandomForestRegressor, train_test_split, OneHotEncoder).
  • Matplotlib: For data visualization.
  • Numpy: For numerical computations.
  • Pandas: For data manipulation and analysis.
  • Urllib: For URL handling and web access.
  • OS: For interacting with the operating system.
  • IO: For handling streams.
  • Gzip: For working with gzipped files.
  • Zipfile: For extracting and creating zip archives.

Key Achievements

  • Successfully analyzed and visualized IMDb data, uncovering key trends and patterns in title releases and viewer preferences.
  • Applied regression models, including linear, polynomial, and random forest, to predict title ratings based on runtime, gaining insights into factors influencing viewer ratings.

Conclusion

The "IMDb-Data-Analysis-Exercise-Part-1" provides an in-depth look into IMDb data, revealing valuable insights into media consumption trends, viewer preferences, and title characteristics. This foundational analysis sets the stage for more advanced studies, including the development of recommendation systems.

Future Work

The next phase, "Part 2", will focus on obtaining data directly from the IMDb database using their API. It will also delve into creating a recommender system for TV shows/movies using the comprehensive IMDb dataset.

Note

To fully understand the conclusions drawn in this analysis, it is recommended to go through the entire notebook, including the code and its outputs. You can view the HTML version of the notebook here.

Author

Jesus Cantu Jr.

Last Updated

June 6, 2023