-
Notifications
You must be signed in to change notification settings - Fork 1
/
capstone_1_proposal.txt
34 lines (26 loc) · 3.53 KB
/
capstone_1_proposal.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Problem:
------------
How do you make a financially successful film?
The film industry continues to reach new heights in terms of global box office revenue. The highest grossing movie of all time is Avatar (2009), which took in a worldwide gross of $2.7 billion. In 2015, the worldwide revenue reached $38.3 billion, and is expected to reach $49.3 billion in 2020. Films, however, can also lose as much money as it can make. In 1999, '47 Ronin' grossed $151 million -- which would be a respectable amount if you do not consider the estimated production budget of $225 million. Thus, every film made is an enormous gamble for the distributor, studio, and/or production company. The costs begin way before production and movies always go over budget. Once a film is greenlit, cash is already being spent or is reserved to be spent in the near future.
To properly manage risk, studios must act at the most critical point of the film's lifecycle: the pitch. If the studio knows what kind of movies make money (or doesn't) then it will be easier to know which films to invest in. Also, if filmmakers know what kind of movies will make money, it will be easier for them to pitch their ideas. The goal of this project is to be able to determine the characteristics which determine a film's financial success and the steps that can be taken to guarantee it. The variables considered will be ones that can be controlled prior to production. A fianncially successful film will be considered a film that generates a revenue that is 20% greater than its budget.
Data
----------
Movie data will be obtained from a Kaggle's IMDB and TMDB datasets. The IMDB dataset includes data on awards won and the TMDB dataset contains the rest of the data. Additional info will be obtained from IMDB via the IMDbPy Python API. Any movies that have won at least one award will be considered an award winning movie. The extracted for each award winning movie will be the variables that are considered to be under the filmmaker's control:
TMDB: https://www.kaggle.com/tmdb/tmdb-movie-metadata/data
IMDB: https://www.kaggle.com/orgesleka/imdbmovies/data
* Title (do certain words show up in all award winning movies?)
* Runtime (can movies be too long/short to win an award?)
* Genres (do multigenre movies win awards over those that are easier to classify?)
* Director (from the studio executive perspective, are certain directors more reliable for creating an award winning film?)
* Top 3 actors in cast (how influential are actors featured? do award winning movies typically feature the same group of actors?)
* Writers (do certain writers create award winning screenplays?)
* Budget (does a movie have to be expensive to win awards?
* MPAA Rating (do movies catered specifically to adults fare better than those catered to a wider audience?)
* Release Date (does it matter if movies are released earlier/later in the year?)
Approach
--------
Exploratory analysis will be conducted to determine what the most influential variables. This will be a supervised problem as the dataset already includes the number of awards each film has won. It will also be a classifcation problem since I am only concerned with whether or not a movie will win at least one award. The predictor variables will be the variables listed above. The training data will come from a subset of the IMDB dataset that include movies that have won 0 to 1+ awards.
Deliverables
------------
* A model that can be used to provide guidance to those in the film industry on how to make an financially successful film
* A paper/slide deck presenting the research