Skip to content

sglibova/dsc-phase-1-project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microsoft Studios Film Data Analysis

Authors:
Marcos Panyagua
Rachel Edwards
Svitlana Glibova

popcorn_banner

Contents and Data

images contains all images used in our presentation and notebooks.
notebooks contains all independent EDA notebooks created to combine into the final notebook file.

In the folder zippedData are movie datasets from:

presentation.pdf is a pdf file containing a non-technical presentation of our data.

Business Problem

Microsoft is planning to venture into film production by opening a new studio and need actionable insights into movie data to determine which films to produce in order to succeed at the box office.

We will clean, assemble, interpret, and visualize data from the given data sets in order to provide recommendations for business decisions.

Business Understanding

In order to provide specific recommendations, we selected several metrics:

  • Producers
    • Which producers create consistently the best-rated films?
    • Along with their film ratings, which producers have the largest margins between production budget and box office income?
  • Genres
    • Creating movies in popular genres will attract more initial viewers.
    • Comparing film genres with their overhead costs and ROIs can provide insights on which types of films could be profitable.
  • Release Dates
    • Viewership will impact the film's overall income and releasing films during peak seasons or months can increase viewership.
    • Exploring the frequency of film releases throughout the year can also provide insights on when other studios are releasing movies and how to address competing films.

Methods

Exploratory Data Analysis (EDA):

  • We examined each individual data set to find metrics to support the topics we are addressing. We then looked for correlations in dataframes to create subsets of usable data to visualize.

Data Understanding:

  • We found correlation between producers, the titles and genres of the films they produced, and financial information to support our topics.
  • Using descriptive statstics, we were able to better understand the connection between metrics such as genre, release date, and producer.

Evaluation:

  • Producers:
    This graph represents the intersection of the budget averages, net income averages, and mean movie rating averages for all producers.
    The intersection of the lowest budget average, highest net income and highest rating could deliver the strongest choice of producer.

budgetmean_vs_incomemean

  • Genres:
    This graph represents profitability of each genre.
    Genres with the highest chances of netting positive could suggest higher viewership counts, historically.

ROI_by_genre

  • Release Dates:
    This graph represents each month by frequency of films released during that month versus net income. Releasing films during months with high net income but low release frequency could address issues of competition and yield higher income.

release_month_scatter

Summary:

Through this data analysis, we are able to provide several recommendations based on our measures of success:

  • We recommend selecting a producer with higher profit margins based on their previous film history.
  • Animation, fantasy, and adventure genres for films historically have a higher potential to net positive.
  • Summer film releases are the most successful for a high net income.

Conclusion:

This data can be useful to finetune selections based on individual budget, staffing, and style limitations.
Our recommendations look to minimize budget and maximize net income but depending on individual goals, the information allows to explore other intersections.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%