For the Udacity Nanodegree Programm, I was given a final project to conduct data analysis and create a file that documents my findings. I started by taking a look at the dataset and brainstorming what questions I could answer using it. Then I used pandas and NumPy Python libraries to answer the research questions that I am most interested in, and create a report sharing the answers.
I have chosen the data set on movie ratings. The CSV file can be found in the repository.
Import nessecary pacakges such as Python and following:
- Numpy
- Pandas
- Matplotlib
- Seaborn
Brainstorm some questions you could answer using the data set you chose, then start answering those questions. You can find some questions in the data set options to help you get started.Try and suggest questions that promote looking at relationships between multiple variables. You should aim to analyze at least one dependent variable and three independent variables in your investigation. Make sure you use NumPy and pandas where they are appropriate!
Specifically, I have focused my project around these questions:
- What is the relationship of popularity of the movie and its budget? Do popular movies have higher budgets?
- Are older movies more popular than newer ones or vice versa? What year has released the most popular movie?
- Do popular movies have recieved highest revenue? What movie have rieceived highest revenue?
Once you have finished analyzing the data, create a report that shares the findings you found most interesting. If you use a Jupyter notebook, share your findings alongside the code you used to perform the analysis. Make sure that your report text is contained in Markdown cells to clearly distinguish your comments and findings from your code work. You should also feel free to use other tools and software to craft your final report, but make sure that you can submit your report as an HTML or PDF file so that it can be opened easily.
This project was based on movie ratings, and analyzed what movies were the most popular and its relation to its revenue and budget. To sum up, our preliminary findings suggest that there is correlation between the popularity of the movies and their revenues, budget and the year of production. We found out that the most popular movies have received the highest revenues and had the highest budgets. However, we also saw that the most popular movie, Jurassic World not necessarily spend the higest budget. Also, the newer movies were the most popular movies with the relase year in 2014 and 2015. Overall, we looked at the correlation of popularity of the movie to its budget and revenue. Also, we explored what years have released the most popular movies. This project had limitations since I haven't done any statistical tests, we can not make definite statistical conclusions. This could be further explored using machine learning and A/B tests.