Movie data Visualization #282 (#311)

# Pull Request for PyVerse 💡 ## Issue Title: Movie Data Visualization - **Info about the related issue (Aim of the project)**: To understand movie data by using data visualization. - **Name**: Ananya Ravikiran Vastare - **GitHub ID**: Ananya-vastare - **Email ID**: ananyarvastare@gmail.com - **Identify yourself**: GSSOC contributor Closes: #282 PR ### Describe the add-ons or changes you've made 📃 I created a new project for data visualization, focusing on enhancing the understanding of movie datasets through various visualization techniques. ## Type of change ☑️ What sort of change have you made: - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Code style update (formatting, local variables) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update ## How Has This Been Tested? ⚙️ The application has been tested with various movie datasets to ensure that visualizations are generated correctly. User interactions have been validated to confirm that the application responds appropriately to different inputs. ## Checklist: ☑️ - [x] My code follows the guidelines of this project. - [x] I have performed a self-review of my own code. - [x] I have commented my code, particularly wherever it was hard to understand. - [x] I have made corresponding changes to the documentation. - [x] My changes generate no new warnings. - [x] I have added things that prove my fix is effective or that my feature works. - [ ] Any dependent changes have been merged and published in downstream modules.
UTSAVS26 · Oct 10, 2024 · 0ed4a7f · 0ed4a7f
2 parents a2eb8d8 + 7c558a2
commit 0ed4a7f
Show file tree

Hide file tree

Showing 3 changed files with 141 additions and 0 deletions.
diff --git a/DataVizLearnig/Movie Data Visualization/MovieRatings.xlsx b/DataVizLearnig/Movie Data Visualization/MovieRatings.xlsx
diff --git a/DataVizLearnig/Movie Data Visualization/Readme.md b/DataVizLearnig/Movie Data Visualization/Readme.md
@@ -0,0 +1,53 @@
+# Movie Data Analysis Application
+
+## Goal
+To create an interactive application that analyzes movie data, including reviews and meta scores, and presents the information visually through various types of graphs.
+
+## Description
+This project implements a movie data analysis application using Streamlit, allowing users to explore movie ratings and trends through visualizations. Users can select from different graph types to visualize the movie dataset, facilitating a better understanding of the underlying data trends.
+
+## What I Have Done
+1. Developed an interactive user interface using Streamlit for visualizing movie data.
+2. Implemented functions to create various visualizations: bar charts, pie charts, and histograms.
+3. Enabled dynamic data loading from an Excel file containing movie ratings and metadata.
+4. Organized code into functions for better clarity and maintainability.
+5. Incorporated basic error handling for data input and visualization selection.
+
+## Models Implemented
+This project primarily focuses on:
+- **Data Visualization** using Matplotlib and Seaborn libraries.
+- **User Interaction** through a simple command-line interface powered by Streamlit.
+- **Data Processing** using Pandas for efficient manipulation of movie datasets.
+
+## Libraries Needed
+- `streamlit`
+- `pandas`
+- `matplotlib`
+- `seaborn`
+- `openpyxl` (if using Excel files)
+
+## Usage
+Run the Streamlit application using the following command:
+
+```bash
+streamlit run your_script_name.py
+# or 
+python -m streamlit run yourprojectname.py
+
+Make sure to change the directory of the Excel file before you run it.
+
+## How to Use
+- Start the application using the command above.
+- Upon loading the application, users will see a title and a description of the project.
+- Load your movie dataset from an Excel file.
+- Choose a visualization type (Bar Chart, Pie Chart, Histogram) from the dropdown menu.
+- Click the "Submit" button to generate the selected graph based on the movie data.
+
+## Conclusion
+This Movie Data Analysis Application serves as an educational tool for understanding data visualization techniques and provides a practical introduction to using Streamlit for building interactive applications. Its design is tailored for beginners to explore data analysis concepts effectively while leveraging Python's powerful data processing libraries.
+
+## License
+This project is licensed under the MIT License. See the LICENSE file for more details.
+
+## Contributing
+Feel free to submit issues or pull requests if you have suggestions for improvements or new features!
diff --git a/DataVizLearnig/Movie Data Visualization/maincode.py b/DataVizLearnig/Movie Data Visualization/maincode.py
@@ -0,0 +1,88 @@
+import streamlit as st
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+# Set the title of the application
+st.title("Movie Data analysis")
+st.write(
+    "This project analyses the movie data such as ,reviews,meta score etc and displays them in the form of a graph"
+)
+
+# Load your dataset
+data = pd.read_excel(
+    r"C:\Users\Ananya\OneDrive\Documents\GitHub\PyVerse\DataVizLearnig\Movie Data Visualization\MovieRatings.xlsx"
+)
+
+# Create a dropdown menu for selecting options
+options = ["Barchart", "PieChart", "Histogram"]
+selected_option = st.selectbox("Choose an option", options)
+
+
+# Function to create and display a bar chart
+def show_barchart():
+    ratings = data["IMDB_Ratings"].value_counts()
+    plt.figure(figsize=(10, 6))
+    cmap = plt.get_cmap("Blues")
+    colors = [cmap(i / len(ratings)) for i in range(5, len(ratings))]
+
+    plt.bar(ratings.index.astype(str), ratings.values, width=0.5, color=colors)
+    plt.xlabel("IMDB Ratings")
+    plt.ylabel("Frequency")
+    plt.title("Bar Chart for Ratings vs Range")
+    st.pyplot(plt)
+
+
+# Function to create and display a pie chart
+def show_piechart():
+    data["Genre"] = data["Genre"].str.split(", ")  # Split genres if multiple
+    exploded_df = data.explode("Genre")
+    genre_counts = exploded_df["Genre"].value_counts()
+
+    # Create a gradient color from dark pink to light pink
+    colors = [
+        (1, 0.08, 0.58, 1 - i / len(genre_counts)) for i in range(len(genre_counts))
+    ]
+
+    plt.figure(figsize=(8, 6))
+    plt.pie(
+        genre_counts,
+        colors=colors,
+        autopct="%1.1f%%",
+        labels=genre_counts.index,  # Add genre names as labels
+        shadow=True,
+        wedgeprops={"edgecolor": "black"},  # Black border between slices
+    )
+    plt.title("Genre Distribution of Top 1000 Movies")
+    st.pyplot(plt)
+
+
+# Function to create and display a histogram
+def show_histogram():
+    # Ensure 'Meta_score' is in the correct format
+    values = pd.to_numeric(data["Meta_score"], errors="coerce")
+
+    # Check for NaN values and handle them (e.g., drop or fill)
+    values = values.dropna()  # Drop NaN values
+
+    # Create a histogram for Metascore with KDE
+    plt.figure(figsize=(10, 6))
+    sns.histplot(values, bins=30, kde=True, color="orange", stat="density", alpha=0.5)
+    plt.title("Histogram of Metascore with KDE")
+    plt.xlabel("Metascore")
+    plt.ylabel("Density")
+    plt.grid(axis="y")
+
+    st.pyplot(plt)
+
+
+# Run the appropriate function based on user selection
+if st.button("Submit"):
+    if selected_option == "Barchart":
+        show_barchart()
+    elif selected_option == "PieChart":
+        show_piechart()
+    elif selected_option == "Histogram":
+        show_histogram()
+    else:
+        st.error("No valid option selected.")