Skip to content

This project is about collecting movie data using themoviedb.org's API endpoint and creating a PowerBI Dashboard for visualization.

Notifications You must be signed in to change notification settings

ongaunjie1/API_DataCollection_and_PowerBI_Visuals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About the Movie Data

Field Name Description Type
Title Movie title. Object
Overview Summary of the movie's plot. Object
Release Date Release date of the movie. Object
Vote Average Average rating given by users. float
Vote Count Number of votes received by the movie. int
Runtime Duration of the movie in minutes. int
Budget Budget allocated for the movie. int
Revenue Revenue generated by the movie. int
Popularity Popularity score of the movie. float
Production Countries Countries where the movie was produced. Object
Production Companies Production companies involved in making the movie. Object
Genres Genres of the movie (e.g., Action, Drama, Science Fiction). Object
  • The data is collected using themoviedb's API endpoint
  • Refer to this link: https://developer.themoviedb.org/reference/intro/getting-started for the API documentation
  • The data consists of the top 500 US-affiliated movies of each year, ranging from 2010 to 2022, and it is collected based on the vote counts from themoviedb.

Goal of the project

  • Collect movie data from themoviedb's API endpoint
  • Perform data cleaning and feature engineering on the data
  • Creating an interactive multi-page PowerBI dashboard to visualize the data

Refer below for the all the steps taken:

Step 1: Collecting data from the API endpoint

  • Refer to data_collection_api.ipynb for the detailed steps

Python Script Overview:

  • The script uses the provided API key for authentication. (You can get an API key for free after registering an account at themoviesdb)
  • It retrieves movie details, including title, overview, release date, vote average, vote count, runtime, budget, revenue, popularity, production countries, production companies, and genres.
  • Data is collected based on vote counts, sorted in descending order.
  • The script ensures a maximum of 500 movies per year and writes the information to a CSV file.

Usage:

  • Replace the 'API KEY' with your actual themoviedb API key.
api_key = 'API KEY'
  • Set the desired start and end years for data collection.
  • The script writes collected data to a CSV file format

Libraries Used:

  • csv: Module for reading and writing CSV files.
  • requests: Module for sending HTTP requests.

Note:

  • themoviesdb also offers data for TV shows. Read the documentation if that interests you

Step 2: Data cleaning and feature engineering

  • Refer to feature_engineering.ipynb for the detailed steps

Notebook Overview

  • Removing rows with values of 0
  • Creating new columns: profits, ROI, year, month, day
  • Convert columns to the appropriate data types
  • One-hot-encode the genres column (This will be utilized in the PowerBI dashboard later on to create a slicer to filter by movie genre, providing users with the ability to explore and analyze the dataset based on different combinations genres.)

Step 3: Load data into PowerBI and Create an interactive dashboard with multi-pages

  • Load the cleaned.csv file into PowerBI and create the dashboard

Dashboard Overview: The dashboard contains 4 pages

  • Main Page: A comprehensive overview of key metrics and trends.
  • Graphs: Visual representations of data trends, enabling in-depth analysis.
  • Movie Details: Dive into specifics with detailed information about each movie.
  • Key Influencers: Discover the factors influencing key aspects.

Dashboard Features:

  • Sidebar Navigation: A user-friendly sidebar with buttons for easy page navigation.
  • Bookmarks: Seamlessly toggle between different states, such as showing or hiding the sidebar.
  • Field Parameters: Effortlessly filter visuals based on different fields, such as top movies by budget, popularity, and more.
  • Simple Slicers: Easily filter movies by release date and genres.
  • Drillthrough feature: Added Drillthrough capability for the movie details
  • Key influencers analysis: Added key influencers tools to analyze factors that have the most impact on a particular outcome.

Refer to the images below for a more detailed showcase of the PowerBI dashboard

1) Main page (First Page)

1

Table: Descriptions for the numbers labeled

image

2) Graph visualization page (Second Page)

This page contains 4 visuals

  • 1st Visual: Analyzing Top production companies based on profits image

  • 2nd Visual: Analyzing Relationship between movie metrics Slide4

  • 3rd Visual: Movie Genre Analysis Slide5

  • 4th Visual: Trend line plots for all movies Slide6

3) Movie details page using drillthrough feature (third Page)

image

  • Right click any movie title from the table and navigate to the drillthrough page

Upon clicking, it will redirect you to the movie details page

image

4) Key influencers page (Fourth Page)

What influences runtime to increase ?

image

What influences profit to increase ?

image

What influences profit to decrease ?

image

What influences budget to decrease ?

image

Accessing the Dashboard:

To explore the dashboard, download and open the PowerBI Desktop app, then load the movie_viz.pbix file.

  • IMPORTANT NOTE: You need to download the PowerBI Desktop in order to open the pbix file
  • You can download the pbix file from this repository

About

This project is about collecting movie data using themoviedb.org's API endpoint and creating a PowerBI Dashboard for visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published