The main objective of this project is to analyse the data and create a movie recommender system.
Going in detailed, this project will walk through the steps importing python libraries, loading data into dataframe, optimising dataframe, data manipulation.
We will divide our work in following categories:
- Data Analysis
- Descriptive statistcs: provide ground knowldege about the features and relations within the dataset
- Visualization: good for overview & understanding underlying relation between data using dynamic plots like plotly and seaborn, and creating wordcloud.
- Building Movie Recommendation System
- Loading Raw Data in a seperate notebook
- Creating a pivot table in batches and appending the dataframe for optimisation and analysing what possible error one can encounter while running huge batches
- Computing correlation between columns of data
- Cleaning up the final movie suggestions
GroupLens Research has collected and made available rating data sets from the MovieLens web site (https://movielens.org). The dataset I’m downloading and using is the “MovieLens 25M Dataset” which includes 25 million reviews. The data sets were collected over various periods of time with the most recent data from 2019.
- MovieLens 25M Dataset: 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users (size: 250MB). It can be downloaded from link: https://grouplens.org/datasets/movielens/25m/
- MovieLens 10M Dataset: 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users (size: 63MB). It can be downloaded from link: https://grouplens.org/datasets/movielens/10m/