Skip to content

Commit

Permalink
Movie classification issue #152 (#214)
Browse files Browse the repository at this point in the history
# Pull Request for PyVerse ๐Ÿ’ก


## Issue Title : Movie Genre Classification Feature Implementation ๐ŸŽฌ๐Ÿ“ฝ๏ธ
#152

- **Info about the related issue (Aim of the project)** : <!-- What's
the goal of the project -->
- **Name:** Sree Praveen challa
- **GitHub ID:** praveenarjun
- **Email ID:** E22CSEU0171@bennett.edu.in
- **Idenitfy yourself: Gssoc Ext,Hackbefest


<!-- Mention the following details and these are mandatory -->

Closes: #152 

### Describe the add-ons or changes you've made ๐Ÿ“ƒ

I add a File in MAchine Learning folder-> Movie Classification->Movie
classification.ipynb && readme file

## Type of change โ˜‘๏ธ

What sort of change have you made:
<!--
Example how to mark a checkbox:-
- [x] My code follows the code style of this project.
-->
-  New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
-  This change requires a documentation update:
  Yes I added Readme file so that User can Understand much better way

## How Has This Been Tested? โš™๏ธ

![Screenshot 2024-10-06 at 12 51
37โ€ฏPM](https://github.com/user-attachments/assets/67c713af-93fb-4c07-b8b6-220c2fade03c)

Describe how it has been tested : Just run it on google colab 
Describe how have you verified the changes made

## Checklist: โ˜‘๏ธ
<!--
Example how to mark a checkbox:-
- [x] My code follows the code style of this project.
-->
- [Yes ] My code follows the guidelines of this project.
- [ Yes] I have performed a self-review of my own code.
- [ Yes] I have commented my code, particularly wherever it was hard to
understand.
- [Yes ] I have made corresponding changes to the documentation.
- [ Yes] My changes generate no new warnings.
- [ Yes] I have added things that prove my fix is effective or that my
feature works.
- [ Yes] Any dependent changes have been merged and published in
downstream modules.
  • Loading branch information
UTSAVS26 authored Oct 10, 2024
2 parents f341e73 + d9686cc commit cd07457
Show file tree
Hide file tree
Showing 5 changed files with 163,922 additions and 0 deletions.
60 changes: 60 additions & 0 deletions Machine_Learning/Movie Classification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# PROJECT TITLE: Movie Classification Model

## ๐ŸŽฏ Goal

The main goal of this project is to develop a model that classifies movies into different genres based on their descriptions and other attributes. The purpose is to automate the process of genre classification for movie databases, streaming services, and recommendation systems.
## ๐Ÿงต Dataset

The dataset used for this project can be found on Kaggle, specifically the Movies Dataset. The dataset contains various attributes of movies such as title, genres, overview, and more.
## ๐Ÿงพ Description

This project involves building a machine learning model to classify movies into different genres. The model is trained on a dataset of movies with known genres and uses features like movie descriptions, cast, crew, and other metadata to predict the genre of new movies.
## ๐Ÿงฎ What I had done!

Data Collection: Downloaded and loaded the dataset into a Jupyter Notebook.
Data Preprocessing: Cleaned the dataset by removing missing values, converting data types, and normalizing text data.
Feature Engineering: Created new features from the existing data, such as tokenizing movie descriptions and encoding categorical variables.
Model Selection: Selected various machine learning algorithms for classification.
Model Training: Trained the models using the training dataset.
Model Evaluation: Evaluated the models using accuracy, precision, recall, and F1-score.
Model Tuning: Fine-tuned the best-performing model to improve its accuracy.
Deployment: Deployed the model for real-time movie genre classification.

## ๐Ÿš€ Models Implemented

Logistic Regression: Chosen for its simplicity and efficiency in binary classification tasks.
Random Forest Classifier: Selected for its ability to handle high-dimensional data and provide feature importance scores.
Support Vector Machine (SVM): Used for its effectiveness in high-dimensional spaces and its ability to handle non-linear data.
Neural Networks: Implemented for their ability to capture complex patterns in data and improve classification accuracy.

## ๐Ÿ“š Libraries Needed

Pandas
NumPy
Scikit-learn
TensorFlow/Keras
Matplotlib
Seaborn
NLTK (Natural Language Toolkit)

## ๐Ÿ“Š Exploratory Data Analysis Results



## ๐Ÿ“ˆ Performance of the Models based on the Accuracy Scores

Logistic Regression: Accuracy - 85%
Random Forest Classifier: Accuracy - 90%
Support Vector Machine (SVM): Accuracy - 88%
Neural Networks: Accuracy - 92%

## ๐Ÿ“ข Conclusion

The Neural Network model provided the highest accuracy of 92%, making it the best-fitted model for this movie classification project. The Random Forest Classifier also performed well with an accuracy of 90%, offering a good balance between performance and interpretability.
## โœ’๏ธ Your Signature

Praveen Arjun

GitHub
LinkedIn

Loading

0 comments on commit cd07457

Please sign in to comment.