Built Clustering model to design a movie recommender based on rating and tags by other users.
- Set up a data science project structure in a new git repository in your GitHub account
- Download the one of the MovieLens datasets from https://grouplens.org/datasets/movielens/
- Load the data set into panda data frames
- Formulate one or two ideas on how the combination of ratings and tags by users helps the data set to establish additional value using exploratory data analysis
- Build one or more clustering models to determine similar movies to recommend using the other ratings and tags of movies by other users as features
- Document your process and results
- Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub
This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from http://movielens.org, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. Users were selected at random for inclusion. All selected users had rated at least 20 movies.Each user is represented by an id and had rated at least 20 movies.
The data are contained in the files links.csv
, movies.csv
, ratings.csv
and tags.csv
.
- use few Clustering models to recommend movies based on ratings and tags by other users.
Using Feature Engineering and exploratory analysis we could use features to the best for training Clustering models and generate Recommendations
- First, we used KMeans Clustering Model, after finding optimal k value which was 20 we could form these clusters:
- Then we used Agglomerative Clustering Model with 20 clusters again and formed these clusters:
- We then built two recommenders each based on above mentioned clustering models.
- American Beauty (1999)
- Good Will Hunting (1997)
- Titanic (1997)
- Eternal Sunshine of the Spotless Mind (2004)
- Beautiful Mind, A (2001)
- Beautiful Mind, A (2001)
- Rob Roy (1995)
- Pearl Harbor (2001)
- Bodyguard, The (1992)
- About Time (2013)
- Project description
- Contains raw data files in csv format
- Jupyter Notebook for Exploratory data analysis, Visualization, Feature Engineering and Clustering.
- plot- popularity of genres
- Plot- Kmeans Clusters
- Plot- Agglomerative Clusters
- Info about Tools, frameworks and libraries required to reproduce the work flow