Movie Revenue Prediction using Natural Language Processing Techniques
This project presents an approach to predicting a movie’s revenue given only its plot summary. The training features were limited to the plot summary in order to (1) allow movie producers to maximize revenue by modifying a movie’s plot pre-production and (2) explore the extent of the relationship between a movie’s plot summary and revenue. The movie data was split into 5 revenue classes, each containing an equal number of movies, with which, by means of training supervised machine learning classifiers, we predicted a plot summary’s revenue class with 29% (± 0.01) accuracy. This 9% (± 0.01) improvement over random selection suggests that there is a relationship between a plot summary and a movie’s revenue, which could be very valuable to the entertainment industry. This can be further investigated via training on more data, extracting new features, and/or varying the classifier.