Abstract: Our group would like to use two datasets to explore Netflix’s coverage of movies and TV shows. More specifically, this project containing two parts: productions’ general trends and contents’ text analysis. In the first part, we strive to find how productions’ characteristics change over time as well as the platform coverage on international contents. In the second part, we analyze key characteristics that contribute to being a “high rated” show (including both TV series and Movies) on Netflix. Our analysis can give a general idea of the development of film and television programs and public’s preference on contents.
Kaggle Dataset: The first part of the analysis we use: https://www.kaggle.com/code/joshuaswords/netflix-data-visualization/data
- The data consists of tv shows and movies available on Netflix from 2008 to 2021. It is collected from Flixable which is a third-party Netflix search engine.
- It contains basic information of each production, such as its TV Parental guidelines, its release year, its genres and others.
The second part of the analysis we use: https://www.kaggle.com/datasets/ashishgup/netflix-rotten-tomatoes-metacritic-imdb
- This dataset combines various data sources from Netflix, Rotten Tomatoes, IMBD, and others.
- It includes summaries and commentaries on contents that we used to conduct NLP analysis.