Movie Genre Prediction

This project aims to predict the genre of movies based on their descriptions using BERT and RoBERTa models. The dataset used for training and evaluation contains movie overviews and their corresponding genres.

Project Overview

Movie genre prediction is a multi-label classification task where each movie description is associated with one or more genres. In this project, we utilize BERT and RoBERTa models to perform this classification task.

Dataset

The dataset used in this project is sourced from Kaggle's Movie Dataset. It contains movie titles, overviews, and genres. The data has been preprocessed to remove missing values and only includes the necessary columns.

rename the dataset as movies.csv!!

The dataset file used in this project is preprocessed_movies.csv, which includes the following columns:

title: The title of the movie
overview: The description of the movie
genres: The genres of the movie

Requirements

To install the required packages, run:

pip install -r requirements.txt

Training

To train the models, you can use the provided training scripts. Here are the steps:

first process the data:

python data_loader.py

python data.py

BERT Model

python BERT_train.py

RoBERTa Model bash

python RoBERTa_train.py

These scripts will train the respective models on the movie dataset and save the trained models in the saved_model directory. The training scripts include progress bars and logging for monitoring the training process.

Prediction

After training the models, you can use the prediction scripts to classify new movie descriptions. Here are the steps:

BERT Model Prediction

python BERT_predict.py

RoBERTa Model Prediction

python RoBERTa_predict.py

The prediction scripts will load the trained models and tokenizer, take a movie description as input, and output the predicted genres along with their probabilities.

Results

The results of the model training and evaluation, including the loss and accuracy curves, are saved and visualized in the training scripts. The evaluation metrics (accuracy, precision, recall, F1 score) are saved in metrics.json.

Acknowledgements

The dataset used in this project is sourced from Kaggle. The models are implemented using the Hugging Face Transformers library. Special thanks to the developers and contributors of the libraries and tools used in this project. markdown Copy code

Explanation

Project Overview:
- Provides a brief description of the project and its objectives.
Dataset:
- Details the source and structure of the dataset used in the project.
Requirements:
- Lists the required Python packages and provides a command to install them using requirements.txt.
Training:
- Explains how to train the models using the provided training scripts.
Prediction:
- Provides instructions on how to use the trained models to predict movie genres for new descriptions.
Results:
- Describes where the results of the training and evaluation are saved and how they are visualized.
Acknowledgements:
- Gives credit to the data source and the libraries used in the project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
README.md		README.md
bert_pre.py		bert_pre.py
bert_train.py		bert_train.py
data.py		data.py
data_loader.py		data_loader.py
requirements.txt		requirements.txt
roberta_pre.py		roberta_pre.py
roberta_train.py		roberta_train.py
training_history_bert.png		training_history_bert.png
training_history_roberta.png		training_history_roberta.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Genre Prediction

Table of Contents

Project Overview

Dataset

rename the dataset as movies.csv!!

Requirements

Training

Prediction

Results

Acknowledgements

Explanation

About

Releases

Packages

Languages

escaperwu/Movie_genres_classify

Folders and files

Latest commit

History

Repository files navigation

Movie Genre Prediction

Table of Contents

Project Overview

Dataset

rename the dataset as movies.csv!!

Requirements

Training

Prediction

Results

Acknowledgements

Explanation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages