The dataset used in this project was provided by Udacity as part of a task or assignment. You can access the dataset directly via this link. It contains information related to movies, including attributes such as IDs, popularity, budget, revenue, and more.
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Jupyter Notebook
I will summarize my findings from the movie data set in key points after I have finished examining it, looking for trends, and asking questions : First, there are a variety of factors that affect how much money movies make and these factors are :
- popularity and movie revenue i see that movies with more popular than other have more revenue.
- choosing a suitable and famous actors in movies is important for movies.
- budget of movies doen't affect the revenue some movies there budget is not too big and there revenue is high.
- when the vote of movie increase that leads more people to ask about it
Second : Date and it relation between the data :
We can observe from the data that the number of films produced has grown over time, and that the film industry has expanded in parallel with developments in technology and the growth of communication tools and the internet, which have made it possible for new films to be distributed globally, and that film production companies select the best time to release their films, as we can see from the seasons.
Thirs : Genres :
In terms of genres, we observe the vast differences in film genres and the successful roles that well-known actors play in a variety of successful movie genres. We also note how popular certain genres become over time and how documentary films end up with zero revenue. We categorize films as successful or failed based on their profitability and genres.
How movie production companies boost their film's earnings ?
In my opinion, and based on the data analysis, to maximize the profits from your film, make sure to market it well, select a well-known and accomplished director and cast, and select a good release date, such as the start or end of the year or a vacation period.
Additional Research :
I'll continue to look into other variables that influence the box office success of films, such as the production country and language, and I'll work to fill in the data gaps and find the reasons behind them.
There is some problems in dataset was big problem to me and Obstruction for me in analysis are :
- More than half of [budget,revenue,budget_adj,revenue_adj] columns was zeroes.
- The precentage of missing values in [keywords,tagline] columns so that lead me to delete all colum without use it in my analysis.
- Huge number of outliers in all quantitive columns that gives me sometimes gives me unaccurate visualizations.
- More than 8 precentage of companies is missing that lead me to fill it with unknown and i lost a valuable information cloud be useful in analysis.