Abstract
Film industries all over the world are producing several hundred movies rapidly and grabbing the attraction of people of all ages. Every movie producer is of keen interest in knowing which movies are either likely to hit or flop in the box office. So, the early prediction of the popularity of a movie is of the utmost importance to the film industry. In this study, we examine factors inside the hidden patterns which become movie popular. In past studies, machine learning techniques were implemented on blog articles, social networking, and social media to predict the success of a movie. Their works focused on which algorithms are better at predicting the success of a movie but less focused on data and attributes related to an ongoing movie and in various directions. In this paper, we inspect this perspective that might be related to the prediction of the results. Data collected from the publicly available Internet Movie Database (IMDb). We implemented five machine learning algorithms, i.e., Generalized Linear Model (GLM), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), and Gradient Boosted Tree (GBT) using Root Mean Squared Error (RMSE) as a performance metric and got the accuracy performances of GLM: 47.9%, DL: 51.1%, DT: 54.5%, RF: 50.0%, and GBT: 49.5%, respectively. We found that GLM is the high achieving accuracy regression classifier due to the lower value of RMSE, which is considered to be better.
Similar content being viewed by others
References
Aguiar E, Chawla NV, Brockman J et al (2014) Engagement vs performance. Proceedins fourth Int Conf learn anal Knowl - LAK ‘14 103–112. https://doi.org/10.1145/2567574.2567583
Asad KI, Ahmed T, Saiedur Rahman M (2012) Movie popularity classification based on inherent movie attributes using C4.5, PART and correlation coefficient. 2012 Int Conf informatics. Electron Vision, ICIEV 2012:747–752. https://doi.org/10.1109/ICIEV.2012.6317401
Asur S, Huberman BA (2010) Predicting the future with social media. Web Intell Intell agent Technol (WI-IAT), 2010 IEEE/WIC/ACM Int Conf on, Vol 1 IEEE 1:492–499
Asur S, Huberman BA (2010) Predicting the future with social media. Proc - 2010 IEEE/WIC/ACM Int Conf web Intell WI 2010 1:492–499. https://doi.org/10.1109/WI-IAT.2010.63
Babu SP (2014) Predicting movie success based on IMDB data. Int J Data Min Tech Appl Integr Intell Res 03:365–368
Basuroy S, Chatterjee S, Ravid SA (2003) How critical are critical reviews? The box office effects of film critics, star power, and budgets. J Mark 67:103–117. https://doi.org/10.1509/jmkg.67.4.103.18692
Billsus D, Pazzani MJ (1998) Learning collaborative information filters. Proc Fifteenth Int Conf Mach Learn 54:48
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chambers M, Dinsmore TW (2015) Advanced Analytics Methodologies: Driving Business Value with Analytics:324
Cizmeci B, Oguducu SG (2018) Predicting IMDb ratings of pre-release movies with factorization machines using social media. UBMK 2018 - 3rd Int Conf Comput Sci Eng 173–178. https://doi.org/10.1109/ubmk.2018.8566661
Cobos R, Wilde A, Zaluska E (2017) Predicting attrition from massive open online courses in FutureLearn and edX. CEUR Workshop Proc 1967:74–93
De Vany A, Walls WD (1999) Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?De Vany, Arthur, and W. David Walls. “Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?” Journal of Cultural Economics, vol. 23, n. J Cult Econ 23:285–318. https://doi.org/10.1023/A:1007608125988
Du J, Xu H, Huang X (2014) Box office prediction based on microblog. Expert Syst Appl 41:1680–1689. https://doi.org/10.1016/j.eswa.2013.08.065
Elberse A (2008) The power of stars: do star actors drive the success of movies? J Mark 71:102–120. https://doi.org/10.1509/jmkg.71.4.102
Eliashberg J, Hui SK, Zhang ZJ (2007) From story line to box office: a new approach for green-lighting movie scripts. Manag Sci 53:881–893. https://doi.org/10.1287/mnsc.1060.0668
Gallaugher J (2008) Netflix case study: David becomes goliath. Gall com:1–16
Han J, Kamber M (2004) Data mining concepts and techniques. Morgan Kauffman Publ
Hastie T, Tibshirani R, Friedman J (2017) The elements of statistical learning; data mining, inference, and prediction. Second Ed 757. https://doi.org/10.1007/b94608
Im D, Nguyen MT (2011) Predicting box-office success of movies in the U . S . Market. Cs 1–5
Ishikawa M, Geczy P, Izumi N, et al (2007) Information diffusion approach to cold-start problem. Proc - 2007 IEEE/WIC/ACM Int Conf web Intell Intell agent Technol - work WI-IAT work 2007 129–132. https://doi.org/10.1109/WIIATW.2007.4427556
Kabra RR, Bichkar RS (2011) Performance prediction of engineering students using decision trees. Int J Comput Appl 36:8–12
Kim Y, Kang M, Jeong SR (2018) Text mining and sentiment analysis for predicting box office success. KSII Trans Internet Inf Syst 12:4090–4102. https://doi.org/10.3837/tiis.2018.08.030
Latif MH, Afzal H (2016) Prediction of movies popularity using machine learning techniques. IJCSNS Int J Comput Sci Netw Secur 16:127–131
Lee K, Park J, Kim I, Choi Y (2018) Predicting movie success with machine learning techniques: ways to improve accuracy. Inf Syst Front 20:577–588. https://doi.org/10.1007/s10796-016-9689-z
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2:1–19. https://doi.org/10.1145/1126004.1126005
Li W, Gao M, Li H, et al (2016) Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. Proc Int Jt Conf Neural Networks 2016-Octob:3130–3137. doi: https://doi.org/10.1109/IJCNN.2016.7727598
Litman BR (1983) Predicting success of theatrical movies: an empirical study. J Pop Cult 16:159–175. https://doi.org/10.1111/j.0022-3840.1983.1604_159.x
Marovic M, Mihokovic M, Miksa M, et al (2011) Automatic movie ratings prediction using machine learning. 2011 Proc 34th Int Conv MIPRO 1640–1645
Masih S, Ihsan I (2019) Using academy awards to predict success of bollywood movies using machine learning algorithms. Int J Adv Comput Sci Appl 10:438–446
Mayr A, Binder H, Gefeller O, Schmid M (2014) The Evolution of Boosting Algorithms From Machine Learning to Statistical Modelling ∗. 1–32
Mendez G, Buskirk T, Lohr S, Haag S (2008) Factors associated with persistence in science and engineering majors: an exploratory study using classification trees and random forests. J Eng Educ 97:57
Mestyán M, Yasseri T, Kertész J (2013) Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data PLoS One:8. https://doi.org/10.1371/journal.pone.0071226
Mishne G, Glance N (2005) Predicting movie sales from blogger sentiment. AAAI Spring Symp Comput Approaches to Anal Weblogs:155–158. https://doi.org/10.1016/j.cger.2010.02.002
Montillo AA (2009) Statistical foundations of data analysis. Springer, New York
Nelson RA, Glotfelty R (2012) Movie stars and box office revenues: an empirical analysis. J Cult Econ 36:141–166. https://doi.org/10.1007/s10824-012-9159-5
Ng VKY, Cribbie RA (2018) The gamma generalized linear model, log transformation, and the robust Yuen-Welch test for analyzing group means with skewed and heteroscedastic data. Commun Stat Simul Comput 0918:1–18. https://doi.org/10.1080/03610918.2018.1440301
Oghina A, Breuss M, Tsagkias M, De Rijke M (2012) Predicting IMDB movie ratings using social media. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 7224 LNCS:503–507. doi: https://doi.org/10.1007/978-3-642-28997-2_51
Popescul A, Pennock DM, Lawrence S (2001) Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. Proc Seventeenth Conf Uncertain Artif Intell 2001:437–444
Prag J, Casavant J (1994) An empirical study of the determinants of revenues and marketing expenditures in the motion picture industry. J Cult Econ 18:217–235. https://doi.org/10.1007/BF01080227
Prettenhofer P (2014) Louppe G (2014) gradient boosted regression trees in Scikit-learn. In PyData, London
Quader N, Gani MO, Di C (2018) Performance evaluation of seven machine learning classification techniques for movie box office success prediction. In: 3rd Int Conf Electr Inf Commun Technol EICT 2017 2018-January, pp 1–6. https://doi.org/10.1109/EICT.2017.8275242
RapidMiner (2016) RapidMiner Documentation. https://docs.rapidminer.com/latest/studio/operators/.
Rundel MC (2018) Linear Regression and Modeling. https://www.coursera.org/learn/linear-regression-model.
Sarwar B, Karypis G, Konstan J, Riedl J (2000) Analysis of recommendation algorithms for e-commerce. Proc 2nd ACM Conf Electron Commer - EC ‘00 158–167. https://doi.org/10.1145/352871.352887
Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. Proc 25th Annu Int ACM SIGIR Conf res Dev Inf Retr - SIGIR ‘02 253. https://doi.org/10.1145/564376.564421
Schmidhuber J (2015) Deep learning in neural networks. Neural Networks 61:85–117. doi: https://doi.org/10.1016/j.neunet.2014.09.003
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254. https://doi.org/10.1016/j.eswa.2005.07.018
Simonoff JS, Sparrow IR (2015) Predicting movie grosses: winners and losers, blockbusters and sleepers. Chance 13:15–24. https://doi.org/10.1080/09332480.2000.10542216
Smith MR, Mitchell L, Giraud-Carrier C, Martinez T (2014) Recommending learning algorithms and their associated hyperparameters. CEUR Workshop Proc 1201:39–40. https://doi.org/10.1145/2487575.2487629
Son J, Kim SB (2017) Content-based filtering for recommendation systems using multiattribute networks. Expert Syst Appl 89:404–412. https://doi.org/10.1016/j.eswa.2017.08.008
Tang TY, Winoto P, Guan A, Chen G (2018) “The foreign language effect” and movie recommendation: a comparative study of sentiment analysis of movie reviews in Chinese and English. ACM Int Conf proceeding Ser 79–84. https://doi.org/10.1145/3195106.3195130
Vu DH, Muttaqi KM, Agalgaonkar AP (2015) A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables. Appl Energy 140:385–394. https://doi.org/10.1016/j.apenergy.2014.12.011
Wang H, Zhang H (2018) Movie genre preference prediction using machine learning for customer base information. 110–116. https://doi.org/10.1109/CCWC.2018.8301647
Wilson DC, Smyth B, Sullivan DO (2003) Sparsity reduction in collaborative recommendation: a case-based approach. Int J Pattern Recognit Artif Intell 17:863–884. https://doi.org/10.1142/s0218001403002678
Xing W, Du D (2018) Dropout prediction in MOOCs: using deep learning for personalized intervention. J Educ Comput Res. https://doi.org/10.1177/0735633118757015
Yamagishi J, Kawai H, Kobayashi T (2008) Phone duration modeling using gradient tree boosting. Speech Commun 50:405–415. https://doi.org/10.1016/j.specom.2007.12.003
Yu L, Liu L, Li X (2005) A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-commerce. Expert Syst Appl 28:67–77. https://doi.org/10.1016/j.eswa.2004.08.013
Zhang W, Skiena S (2009) Improving movie gross prediction through news analysis. Proc - 2009 IEEE/WIC/ACM Int Conf web Intell WI 2009 1:301–304. https://doi.org/10.1109/WI-IAT.2009.53
Zhang L, Luo J, Yang S (2009) Forecasting box office revenue of movies with BP neural network. Expert Syst Appl 36:6580–6587. https://doi.org/10.1016/j.eswa.2008.07.064
Acknowledgments
The effort of this paper supported by “NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA, grant number 91630206, and 61572434”, and “THE NATIONAL KEY R&D PROGRAM OF CHINA, grant number 2017YFB0701501”.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abidi, S.M.R., Xu, Y., Ni, J. et al. Popularity prediction of movies: from statistical modeling to machine learning techniques. Multimed Tools Appl 79, 35583–35617 (2020). https://doi.org/10.1007/s11042-019-08546-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08546-5