MovieLens_DataSet

The main objective of this project is to analyse the data and create a movie recommender system.
Going in detailed, this project will walk through the steps importing python libraries, loading data into dataframe, optimising dataframe, data manipulation.

We will divide our work in following categories:

Data Analysis

Descriptive statistcs: provide ground knowldege about the features and relations within the dataset
Visualization: good for overview & understanding underlying relation between data using dynamic plots like plotly and seaborn, and creating wordcloud.

Building Movie Recommendation System

Loading Raw Data in a seperate notebook
Creating a pivot table in batches and appending the dataframe for optimisation and analysing what possible error one can encounter while running huge batches
Computing correlation between columns of data
Cleaning up the final movie suggestions

GroupLens Research has collected and made available rating data sets from the MovieLens web site (https://movielens.org). The dataset I’m downloading and using is the “MovieLens 25M Dataset” which includes 25 million reviews. The data sets were collected over various periods of time with the most recent data from 2019.

MovieLens 25M Dataset: 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users (size: 250MB). It can be downloaded from link: https://grouplens.org/datasets/movielens/25m/
MovieLens 10M Dataset: 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users (size: 63MB). It can be downloaded from link: https://grouplens.org/datasets/movielens/10m/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MovieLens_DataSet

Files

README.md

Latest commit

History

README.md

File metadata and controls

MovieLens_DataSet