Movie classification issue #152 (#214)

# Pull Request for PyVerse 💡 ## Issue Title : Movie Genre Classification Feature Implementation 🎬📽️ #152 - **Info about the related issue (Aim of the project)** :  - **Name:** Sree Praveen challa - **GitHub ID:** praveenarjun - **Email ID:** E22CSEU0171@bennett.edu.in - **Idenitfy yourself: Gssoc Ext,Hackbefest  Closes: #152 ### Describe the add-ons or changes you've made 📃 I add a File in MAchine Learning folder-> Movie Classification->Movie classification.ipynb && readme file ## Type of change ☑️ What sort of change have you made:  - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update: Yes I added Readme file so that User can Understand much better way ## How Has This Been Tested? ⚙️ ![Screenshot 2024-10-06 at 12 51 37 PM](https://github.com/user-attachments/assets/67c713af-93fb-4c07-b8b6-220c2fade03c) Describe how it has been tested : Just run it on google colab Describe how have you verified the changes made ## Checklist: ☑️  - [Yes ] My code follows the guidelines of this project. - [ Yes] I have performed a self-review of my own code. - [ Yes] I have commented my code, particularly wherever it was hard to understand. - [Yes ] I have made corresponding changes to the documentation. - [ Yes] My changes generate no new warnings. - [ Yes] I have added things that prove my fix is effective or that my feature works. - [ Yes] Any dependent changes have been merged and published in downstream modules.
UTSAVS26 · Oct 10, 2024 · cd07457 · cd07457
2 parents f341e73 + d9686cc
commit cd07457
Show file tree

Hide file tree

Showing 5 changed files with 163,922 additions and 0 deletions.
diff --git a/Machine_Learning/Movie Classification/README.md b/Machine_Learning/Movie Classification/README.md
@@ -0,0 +1,60 @@
+# PROJECT TITLE: Movie Classification Model
+
+## 🎯 Goal
+
+The main goal of this project is to develop a model that classifies movies into different genres based on their descriptions and other attributes. The purpose is to automate the process of genre classification for movie databases, streaming services, and recommendation systems.
+## 🧵 Dataset
+
+The dataset used for this project can be found on Kaggle, specifically the Movies Dataset. The dataset contains various attributes of movies such as title, genres, overview, and more.
+## 🧾 Description
+
+This project involves building a machine learning model to classify movies into different genres. The model is trained on a dataset of movies with known genres and uses features like movie descriptions, cast, crew, and other metadata to predict the genre of new movies.
+## 🧮 What I had done!
+
+    Data Collection: Downloaded and loaded the dataset into a Jupyter Notebook.
+    Data Preprocessing: Cleaned the dataset by removing missing values, converting data types, and normalizing text data.
+    Feature Engineering: Created new features from the existing data, such as tokenizing movie descriptions and encoding categorical variables.
+    Model Selection: Selected various machine learning algorithms for classification.
+    Model Training: Trained the models using the training dataset.
+    Model Evaluation: Evaluated the models using accuracy, precision, recall, and F1-score.
+    Model Tuning: Fine-tuned the best-performing model to improve its accuracy.
+    Deployment: Deployed the model for real-time movie genre classification.
+
+## 🚀 Models Implemented
+
+    Logistic Regression: Chosen for its simplicity and efficiency in binary classification tasks.
+    Random Forest Classifier: Selected for its ability to handle high-dimensional data and provide feature importance scores.
+    Support Vector Machine (SVM): Used for its effectiveness in high-dimensional spaces and its ability to handle non-linear data.
+    Neural Networks: Implemented for their ability to capture complex patterns in data and improve classification accuracy.
+
+## 📚 Libraries Needed
+
+    Pandas
+    NumPy
+    Scikit-learn
+    TensorFlow/Keras
+    Matplotlib
+    Seaborn
+    NLTK (Natural Language Toolkit)
+
+## 📊 Exploratory Data Analysis Results
+
+
+
+## 📈 Performance of the Models based on the Accuracy Scores
+
+    Logistic Regression: Accuracy - 85%
+    Random Forest Classifier: Accuracy - 90%
+    Support Vector Machine (SVM): Accuracy - 88%
+    Neural Networks: Accuracy - 92%
+
+## 📢 Conclusion
+
+The Neural Network model provided the highest accuracy of 92%, making it the best-fitted model for this movie classification project. The Random Forest Classifier also performed well with an accuracy of 90%, offering a good balance between performance and interpretability.
+## ✒️ Your Signature
+
+Praveen Arjun
+
+    GitHub
+    LinkedIn
+