Skip to content

Cluster Analysis of Personal IMDB Ratings to find insights and segment user preference.

Notifications You must be signed in to change notification settings

DnanaDev/User_Profile_Cluster_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PCA & K-Means Cluster Analysis of Personal IMDB Ratings

Notebook
Data Source : IMDB Account.

Objectives:

  1. Attempting to Find patterns in My IMDb movie Ratings over a period of 5 Years, 345 Films. Find insights about preferences etc., Important Features Using (I) EDA (II) K-Means Clustering.
  2. Find Interesting Clusters of Movies Using K-Means Clustering.
  3. Use as many features as possible with feature engineering to add features, one-hot encoding categorical features like Genres, Directors etc. (~275 Features)
  4. Use Principal Component Analysis to reduce dimensionality of data and improve Clustering output. (120 Features capturing 50% of the Variance of the data.)
  5. Gathering Insights on preferences by performing hypothesis testing on these clusters with respect to mean Ratings.
    Clusters

Insights :

Statistically Significant Clusters indicate that the author:
1. Prefers - Prestige/lesser-known Dramas, Romantic/Comedic Dramas.
2. Dislikes - Newer Action-Adventure, Sci-Fi, Superhero movies, Hindi Comedies.
* It was also found that the IMDb rating, Popularity of movies were important positive features, whereas the year of release of a movie had an inverse relationship on the rating on the movie.
* Clusters are also formed of films by directors that the author seems to prefer (high mean difference to the overall mean of ratings), These include films by Directors Satyajit Ray, Brad Bird, Andrew Stanton, Dean DeBlois, Edgar Wright, Steven Spielberg, Wes Anderson etc.
* Clusters of Directors that make longer than average movies were also formed but were not found to have a relevant effect on the preference of the author. The clusters with the most positive deviation in popularity were of films by David Fincher, Christopher Nolan.

About

Cluster Analysis of Personal IMDB Ratings to find insights and segment user preference.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published