Skip to main content
Log in

Popularity prediction of movies: from statistical modeling to machine learning techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Film industries all over the world are producing several hundred movies rapidly and grabbing the attraction of people of all ages. Every movie producer is of keen interest in knowing which movies are either likely to hit or flop in the box office. So, the early prediction of the popularity of a movie is of the utmost importance to the film industry. In this study, we examine factors inside the hidden patterns which become movie popular. In past studies, machine learning techniques were implemented on blog articles, social networking, and social media to predict the success of a movie. Their works focused on which algorithms are better at predicting the success of a movie but less focused on data and attributes related to an ongoing movie and in various directions. In this paper, we inspect this perspective that might be related to the prediction of the results. Data collected from the publicly available Internet Movie Database (IMDb). We implemented five machine learning algorithms, i.e., Generalized Linear Model (GLM), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), and Gradient Boosted Tree (GBT) using Root Mean Squared Error (RMSE) as a performance metric and got the accuracy performances of GLM: 47.9%, DL: 51.1%, DT: 54.5%, RF: 50.0%, and GBT: 49.5%, respectively. We found that GLM is the high achieving accuracy regression classifier due to the lower value of RMSE, which is considered to be better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Aguiar E, Chawla NV, Brockman J et al (2014) Engagement vs performance. Proceedins fourth Int Conf learn anal Knowl - LAK ‘14 103–112. https://doi.org/10.1145/2567574.2567583

  2. Asad KI, Ahmed T, Saiedur Rahman M (2012) Movie popularity classification based on inherent movie attributes using C4.5, PART and correlation coefficient. 2012 Int Conf informatics. Electron Vision, ICIEV 2012:747–752. https://doi.org/10.1109/ICIEV.2012.6317401

    Article  Google Scholar 

  3. Asur S, Huberman BA (2010) Predicting the future with social media. Web Intell Intell agent Technol (WI-IAT), 2010 IEEE/WIC/ACM Int Conf on, Vol 1 IEEE 1:492–499

  4. Asur S, Huberman BA (2010) Predicting the future with social media. Proc - 2010 IEEE/WIC/ACM Int Conf web Intell WI 2010 1:492–499. https://doi.org/10.1109/WI-IAT.2010.63

  5. Babu SP (2014) Predicting movie success based on IMDB data. Int J Data Min Tech Appl Integr Intell Res 03:365–368

    Google Scholar 

  6. Basuroy S, Chatterjee S, Ravid SA (2003) How critical are critical reviews? The box office effects of film critics, star power, and budgets. J Mark 67:103–117. https://doi.org/10.1509/jmkg.67.4.103.18692

    Article  Google Scholar 

  7. Billsus D, Pazzani MJ (1998) Learning collaborative information filters. Proc Fifteenth Int Conf Mach Learn 54:48

    Google Scholar 

  8. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  9. Chambers M, Dinsmore TW (2015) Advanced Analytics Methodologies: Driving Business Value with Analytics:324

  10. Cizmeci B, Oguducu SG (2018) Predicting IMDb ratings of pre-release movies with factorization machines using social media. UBMK 2018 - 3rd Int Conf Comput Sci Eng 173–178. https://doi.org/10.1109/ubmk.2018.8566661

  11. Cobos R, Wilde A, Zaluska E (2017) Predicting attrition from massive open online courses in FutureLearn and edX. CEUR Workshop Proc 1967:74–93

    Google Scholar 

  12. De Vany A, Walls WD (1999) Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?De Vany, Arthur, and W. David Walls. “Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?” Journal of Cultural Economics, vol. 23, n. J Cult Econ 23:285–318. https://doi.org/10.1023/A:1007608125988

  13. Du J, Xu H, Huang X (2014) Box office prediction based on microblog. Expert Syst Appl 41:1680–1689. https://doi.org/10.1016/j.eswa.2013.08.065

    Article  Google Scholar 

  14. Elberse A (2008) The power of stars: do star actors drive the success of movies? J Mark 71:102–120. https://doi.org/10.1509/jmkg.71.4.102

    Article  Google Scholar 

  15. Eliashberg J, Hui SK, Zhang ZJ (2007) From story line to box office: a new approach for green-lighting movie scripts. Manag Sci 53:881–893. https://doi.org/10.1287/mnsc.1060.0668

    Article  Google Scholar 

  16. Gallaugher J (2008) Netflix case study: David becomes goliath. Gall com:1–16

  17. Han J, Kamber M (2004) Data mining concepts and techniques. Morgan Kauffman Publ

  18. Hastie T, Tibshirani R, Friedman J (2017) The elements of statistical learning; data mining, inference, and prediction. Second Ed 757. https://doi.org/10.1007/b94608

  19. Im D, Nguyen MT (2011) Predicting box-office success of movies in the U . S . Market. Cs 1–5

  20. Ishikawa M, Geczy P, Izumi N, et al (2007) Information diffusion approach to cold-start problem. Proc - 2007 IEEE/WIC/ACM Int Conf web Intell Intell agent Technol - work WI-IAT work 2007 129–132. https://doi.org/10.1109/WIIATW.2007.4427556

  21. Kabra RR, Bichkar RS (2011) Performance prediction of engineering students using decision trees. Int J Comput Appl 36:8–12

    Google Scholar 

  22. Kim Y, Kang M, Jeong SR (2018) Text mining and sentiment analysis for predicting box office success. KSII Trans Internet Inf Syst 12:4090–4102. https://doi.org/10.3837/tiis.2018.08.030

    Article  Google Scholar 

  23. Latif MH, Afzal H (2016) Prediction of movies popularity using machine learning techniques. IJCSNS Int J Comput Sci Netw Secur 16:127–131

    Google Scholar 

  24. Lee K, Park J, Kim I, Choi Y (2018) Predicting movie success with machine learning techniques: ways to improve accuracy. Inf Syst Front 20:577–588. https://doi.org/10.1007/s10796-016-9689-z

    Article  Google Scholar 

  25. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2:1–19. https://doi.org/10.1145/1126004.1126005

    Article  Google Scholar 

  26. Li W, Gao M, Li H, et al (2016) Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. Proc Int Jt Conf Neural Networks 2016-Octob:3130–3137. doi: https://doi.org/10.1109/IJCNN.2016.7727598

  27. Litman BR (1983) Predicting success of theatrical movies: an empirical study. J Pop Cult 16:159–175. https://doi.org/10.1111/j.0022-3840.1983.1604_159.x

    Article  Google Scholar 

  28. Marovic M, Mihokovic M, Miksa M, et al (2011) Automatic movie ratings prediction using machine learning. 2011 Proc 34th Int Conv MIPRO 1640–1645

  29. Masih S, Ihsan I (2019) Using academy awards to predict success of bollywood movies using machine learning algorithms. Int J Adv Comput Sci Appl 10:438–446

    Google Scholar 

  30. Mayr A, Binder H, Gefeller O, Schmid M (2014) The Evolution of Boosting Algorithms From Machine Learning to Statistical Modelling ∗. 1–32

  31. Mendez G, Buskirk T, Lohr S, Haag S (2008) Factors associated with persistence in science and engineering majors: an exploratory study using classification trees and random forests. J Eng Educ 97:57

    Article  Google Scholar 

  32. Mestyán M, Yasseri T, Kertész J (2013) Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data PLoS One:8. https://doi.org/10.1371/journal.pone.0071226

  33. Mishne G, Glance N (2005) Predicting movie sales from blogger sentiment. AAAI Spring Symp Comput Approaches to Anal Weblogs:155–158. https://doi.org/10.1016/j.cger.2010.02.002

  34. Montillo AA (2009) Statistical foundations of data analysis. Springer, New York

    Google Scholar 

  35. Nelson RA, Glotfelty R (2012) Movie stars and box office revenues: an empirical analysis. J Cult Econ 36:141–166. https://doi.org/10.1007/s10824-012-9159-5

    Article  Google Scholar 

  36. Ng VKY, Cribbie RA (2018) The gamma generalized linear model, log transformation, and the robust Yuen-Welch test for analyzing group means with skewed and heteroscedastic data. Commun Stat Simul Comput 0918:1–18. https://doi.org/10.1080/03610918.2018.1440301

    Article  Google Scholar 

  37. Oghina A, Breuss M, Tsagkias M, De Rijke M (2012) Predicting IMDB movie ratings using social media. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 7224 LNCS:503–507. doi: https://doi.org/10.1007/978-3-642-28997-2_51

  38. Popescul A, Pennock DM, Lawrence S (2001) Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. Proc Seventeenth Conf Uncertain Artif Intell 2001:437–444

    Google Scholar 

  39. Prag J, Casavant J (1994) An empirical study of the determinants of revenues and marketing expenditures in the motion picture industry. J Cult Econ 18:217–235. https://doi.org/10.1007/BF01080227

    Article  Google Scholar 

  40. Prettenhofer P (2014) Louppe G (2014) gradient boosted regression trees in Scikit-learn. In PyData, London

    Google Scholar 

  41. Quader N, Gani MO, Di C (2018) Performance evaluation of seven machine learning classification techniques for movie box office success prediction. In: 3rd Int Conf Electr Inf Commun Technol EICT 2017 2018-January, pp 1–6. https://doi.org/10.1109/EICT.2017.8275242

    Chapter  Google Scholar 

  42. RapidMiner (2016) RapidMiner Documentation. https://docs.rapidminer.com/latest/studio/operators/.

  43. Rundel MC (2018) Linear Regression and Modeling. https://www.coursera.org/learn/linear-regression-model.

  44. Sarwar B, Karypis G, Konstan J, Riedl J (2000) Analysis of recommendation algorithms for e-commerce. Proc 2nd ACM Conf Electron Commer - EC ‘00 158–167. https://doi.org/10.1145/352871.352887

  45. Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. Proc 25th Annu Int ACM SIGIR Conf res Dev Inf Retr - SIGIR ‘02 253. https://doi.org/10.1145/564376.564421

  46. Schmidhuber J (2015) Deep learning in neural networks. Neural Networks 61:85–117. doi: https://doi.org/10.1016/j.neunet.2014.09.003

  47. Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254. https://doi.org/10.1016/j.eswa.2005.07.018

    Article  Google Scholar 

  48. Simonoff JS, Sparrow IR (2015) Predicting movie grosses: winners and losers, blockbusters and sleepers. Chance 13:15–24. https://doi.org/10.1080/09332480.2000.10542216

    Article  MathSciNet  Google Scholar 

  49. Smith MR, Mitchell L, Giraud-Carrier C, Martinez T (2014) Recommending learning algorithms and their associated hyperparameters. CEUR Workshop Proc 1201:39–40. https://doi.org/10.1145/2487575.2487629

    Article  Google Scholar 

  50. Son J, Kim SB (2017) Content-based filtering for recommendation systems using multiattribute networks. Expert Syst Appl 89:404–412. https://doi.org/10.1016/j.eswa.2017.08.008

    Article  Google Scholar 

  51. Tang TY, Winoto P, Guan A, Chen G (2018) “The foreign language effect” and movie recommendation: a comparative study of sentiment analysis of movie reviews in Chinese and English. ACM Int Conf proceeding Ser 79–84. https://doi.org/10.1145/3195106.3195130

  52. Vu DH, Muttaqi KM, Agalgaonkar AP (2015) A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables. Appl Energy 140:385–394. https://doi.org/10.1016/j.apenergy.2014.12.011

    Article  Google Scholar 

  53. Wang H, Zhang H (2018) Movie genre preference prediction using machine learning for customer base information. 110–116. https://doi.org/10.1109/CCWC.2018.8301647

  54. Wilson DC, Smyth B, Sullivan DO (2003) Sparsity reduction in collaborative recommendation: a case-based approach. Int J Pattern Recognit Artif Intell 17:863–884. https://doi.org/10.1142/s0218001403002678

    Article  Google Scholar 

  55. Xing W, Du D (2018) Dropout prediction in MOOCs: using deep learning for personalized intervention. J Educ Comput Res. https://doi.org/10.1177/0735633118757015

  56. Yamagishi J, Kawai H, Kobayashi T (2008) Phone duration modeling using gradient tree boosting. Speech Commun 50:405–415. https://doi.org/10.1016/j.specom.2007.12.003

    Article  Google Scholar 

  57. Yu L, Liu L, Li X (2005) A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-commerce. Expert Syst Appl 28:67–77. https://doi.org/10.1016/j.eswa.2004.08.013

    Article  Google Scholar 

  58. Zhang W, Skiena S (2009) Improving movie gross prediction through news analysis. Proc - 2009 IEEE/WIC/ACM Int Conf web Intell WI 2009 1:301–304. https://doi.org/10.1109/WI-IAT.2009.53

  59. Zhang L, Luo J, Yang S (2009) Forecasting box office revenue of movies with BP neural network. Expert Syst Appl 36:6580–6587. https://doi.org/10.1016/j.eswa.2008.07.064

    Article  Google Scholar 

Download references

Acknowledgments

The effort of this paper supported by “NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA, grant number 91630206, and 61572434”, and “THE NATIONAL KEY R&D PROGRAM OF CHINA, grant number 2017YFB0701501”.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Syed Muhammad Raza Abidi or Wu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abidi, S.M.R., Xu, Y., Ni, J. et al. Popularity prediction of movies: from statistical modeling to machine learning techniques. Multimed Tools Appl 79, 35583–35617 (2020). https://doi.org/10.1007/s11042-019-08546-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08546-5

Keywords

Navigation