Popularity prediction of movies: from statistical modeling to machine learning techniques

Abidi, Syed Muhammad Raza; Xu, Yonglin; Ni, Jianyue; Wang, Xiangmeng; Zhang, Wu

doi:10.1007/s11042-019-08546-5

Popularity prediction of movies: from statistical modeling to machine learning techniques

Published: 06 January 2020

Volume 79, pages 35583–35617, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Syed Muhammad Raza Abidi ORCID: orcid.org/0000-0001-8808-0882¹,
Yonglin Xu¹,
Jianyue Ni¹,
Xiangmeng Wang¹ &
…
Wu Zhang^1,2

2699 Accesses
18 Citations
5 Altmetric
Explore all metrics

Abstract

Film industries all over the world are producing several hundred movies rapidly and grabbing the attraction of people of all ages. Every movie producer is of keen interest in knowing which movies are either likely to hit or flop in the box office. So, the early prediction of the popularity of a movie is of the utmost importance to the film industry. In this study, we examine factors inside the hidden patterns which become movie popular. In past studies, machine learning techniques were implemented on blog articles, social networking, and social media to predict the success of a movie. Their works focused on which algorithms are better at predicting the success of a movie but less focused on data and attributes related to an ongoing movie and in various directions. In this paper, we inspect this perspective that might be related to the prediction of the results. Data collected from the publicly available Internet Movie Database (IMDb). We implemented five machine learning algorithms, i.e., Generalized Linear Model (GLM), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), and Gradient Boosted Tree (GBT) using Root Mean Squared Error (RMSE) as a performance metric and got the accuracy performances of GLM: 47.9%, DL: 51.1%, DT: 54.5%, RF: 50.0%, and GBT: 49.5%, respectively. We found that GLM is the high achieving accuracy regression classifier due to the lower value of RMSE, which is considered to be better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

References

Aguiar E, Chawla NV, Brockman J et al (2014) Engagement vs performance. Proceedins fourth Int Conf learn anal Knowl - LAK ‘14 103–112. https://doi.org/10.1145/2567574.2567583
Asad KI, Ahmed T, Saiedur Rahman M (2012) Movie popularity classification based on inherent movie attributes using C4.5, PART and correlation coefficient. 2012 Int Conf informatics. Electron Vision, ICIEV 2012:747–752. https://doi.org/10.1109/ICIEV.2012.6317401
Article Google Scholar
Asur S, Huberman BA (2010) Predicting the future with social media. Web Intell Intell agent Technol (WI-IAT), 2010 IEEE/WIC/ACM Int Conf on, Vol 1 IEEE 1:492–499
Asur S, Huberman BA (2010) Predicting the future with social media. Proc - 2010 IEEE/WIC/ACM Int Conf web Intell WI 2010 1:492–499. https://doi.org/10.1109/WI-IAT.2010.63
Babu SP (2014) Predicting movie success based on IMDB data. Int J Data Min Tech Appl Integr Intell Res 03:365–368
Google Scholar
Basuroy S, Chatterjee S, Ravid SA (2003) How critical are critical reviews? The box office effects of film critics, star power, and budgets. J Mark 67:103–117. https://doi.org/10.1509/jmkg.67.4.103.18692
Article Google Scholar
Billsus D, Pazzani MJ (1998) Learning collaborative information filters. Proc Fifteenth Int Conf Mach Learn 54:48
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Chambers M, Dinsmore TW (2015) Advanced Analytics Methodologies: Driving Business Value with Analytics:324
Cizmeci B, Oguducu SG (2018) Predicting IMDb ratings of pre-release movies with factorization machines using social media. UBMK 2018 - 3rd Int Conf Comput Sci Eng 173–178. https://doi.org/10.1109/ubmk.2018.8566661
Cobos R, Wilde A, Zaluska E (2017) Predicting attrition from massive open online courses in FutureLearn and edX. CEUR Workshop Proc 1967:74–93
Google Scholar
De Vany A, Walls WD (1999) Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?De Vany, Arthur, and W. David Walls. “Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?” Journal of Cultural Economics, vol. 23, n. J Cult Econ 23:285–318. https://doi.org/10.1023/A:1007608125988
Du J, Xu H, Huang X (2014) Box office prediction based on microblog. Expert Syst Appl 41:1680–1689. https://doi.org/10.1016/j.eswa.2013.08.065
Article Google Scholar
Elberse A (2008) The power of stars: do star actors drive the success of movies? J Mark 71:102–120. https://doi.org/10.1509/jmkg.71.4.102
Article Google Scholar
Eliashberg J, Hui SK, Zhang ZJ (2007) From story line to box office: a new approach for green-lighting movie scripts. Manag Sci 53:881–893. https://doi.org/10.1287/mnsc.1060.0668
Article Google Scholar
Gallaugher J (2008) Netflix case study: David becomes goliath. Gall com:1–16
Han J, Kamber M (2004) Data mining concepts and techniques. Morgan Kauffman Publ
Hastie T, Tibshirani R, Friedman J (2017) The elements of statistical learning; data mining, inference, and prediction. Second Ed 757. https://doi.org/10.1007/b94608
Im D, Nguyen MT (2011) Predicting box-office success of movies in the U . S . Market. Cs 1–5
Ishikawa M, Geczy P, Izumi N, et al (2007) Information diffusion approach to cold-start problem. Proc - 2007 IEEE/WIC/ACM Int Conf web Intell Intell agent Technol - work WI-IAT work 2007 129–132. https://doi.org/10.1109/WIIATW.2007.4427556
Kabra RR, Bichkar RS (2011) Performance prediction of engineering students using decision trees. Int J Comput Appl 36:8–12
Google Scholar
Kim Y, Kang M, Jeong SR (2018) Text mining and sentiment analysis for predicting box office success. KSII Trans Internet Inf Syst 12:4090–4102. https://doi.org/10.3837/tiis.2018.08.030
Article Google Scholar
Latif MH, Afzal H (2016) Prediction of movies popularity using machine learning techniques. IJCSNS Int J Comput Sci Netw Secur 16:127–131
Google Scholar
Lee K, Park J, Kim I, Choi Y (2018) Predicting movie success with machine learning techniques: ways to improve accuracy. Inf Syst Front 20:577–588. https://doi.org/10.1007/s10796-016-9689-z
Article Google Scholar
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2:1–19. https://doi.org/10.1145/1126004.1126005
Article Google Scholar
Li W, Gao M, Li H, et al (2016) Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. Proc Int Jt Conf Neural Networks 2016-Octob:3130–3137. doi: https://doi.org/10.1109/IJCNN.2016.7727598
Litman BR (1983) Predicting success of theatrical movies: an empirical study. J Pop Cult 16:159–175. https://doi.org/10.1111/j.0022-3840.1983.1604_159.x
Article Google Scholar
Marovic M, Mihokovic M, Miksa M, et al (2011) Automatic movie ratings prediction using machine learning. 2011 Proc 34th Int Conv MIPRO 1640–1645
Masih S, Ihsan I (2019) Using academy awards to predict success of bollywood movies using machine learning algorithms. Int J Adv Comput Sci Appl 10:438–446
Google Scholar
Mayr A, Binder H, Gefeller O, Schmid M (2014) The Evolution of Boosting Algorithms From Machine Learning to Statistical Modelling ∗. 1–32
Mendez G, Buskirk T, Lohr S, Haag S (2008) Factors associated with persistence in science and engineering majors: an exploratory study using classification trees and random forests. J Eng Educ 97:57
Article Google Scholar
Mestyán M, Yasseri T, Kertész J (2013) Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data PLoS One:8. https://doi.org/10.1371/journal.pone.0071226
Mishne G, Glance N (2005) Predicting movie sales from blogger sentiment. AAAI Spring Symp Comput Approaches to Anal Weblogs:155–158. https://doi.org/10.1016/j.cger.2010.02.002
Montillo AA (2009) Statistical foundations of data analysis. Springer, New York
Google Scholar
Nelson RA, Glotfelty R (2012) Movie stars and box office revenues: an empirical analysis. J Cult Econ 36:141–166. https://doi.org/10.1007/s10824-012-9159-5
Article Google Scholar
Ng VKY, Cribbie RA (2018) The gamma generalized linear model, log transformation, and the robust Yuen-Welch test for analyzing group means with skewed and heteroscedastic data. Commun Stat Simul Comput 0918:1–18. https://doi.org/10.1080/03610918.2018.1440301
Article Google Scholar
Oghina A, Breuss M, Tsagkias M, De Rijke M (2012) Predicting IMDB movie ratings using social media. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 7224 LNCS:503–507. doi: https://doi.org/10.1007/978-3-642-28997-2_51
Popescul A, Pennock DM, Lawrence S (2001) Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. Proc Seventeenth Conf Uncertain Artif Intell 2001:437–444
Google Scholar
Prag J, Casavant J (1994) An empirical study of the determinants of revenues and marketing expenditures in the motion picture industry. J Cult Econ 18:217–235. https://doi.org/10.1007/BF01080227
Article Google Scholar
Prettenhofer P (2014) Louppe G (2014) gradient boosted regression trees in Scikit-learn. In PyData, London
Google Scholar
Quader N, Gani MO, Di C (2018) Performance evaluation of seven machine learning classification techniques for movie box office success prediction. In: 3rd Int Conf Electr Inf Commun Technol EICT 2017 2018-January, pp 1–6. https://doi.org/10.1109/EICT.2017.8275242
Chapter Google Scholar
RapidMiner (2016) RapidMiner Documentation. https://docs.rapidminer.com/latest/studio/operators/.
Rundel MC (2018) Linear Regression and Modeling. https://www.coursera.org/learn/linear-regression-model.
Sarwar B, Karypis G, Konstan J, Riedl J (2000) Analysis of recommendation algorithms for e-commerce. Proc 2nd ACM Conf Electron Commer - EC ‘00 158–167. https://doi.org/10.1145/352871.352887
Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. Proc 25th Annu Int ACM SIGIR Conf res Dev Inf Retr - SIGIR ‘02 253. https://doi.org/10.1145/564376.564421
Schmidhuber J (2015) Deep learning in neural networks. Neural Networks 61:85–117. doi: https://doi.org/10.1016/j.neunet.2014.09.003
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254. https://doi.org/10.1016/j.eswa.2005.07.018
Article Google Scholar
Simonoff JS, Sparrow IR (2015) Predicting movie grosses: winners and losers, blockbusters and sleepers. Chance 13:15–24. https://doi.org/10.1080/09332480.2000.10542216
Article MathSciNet Google Scholar
Smith MR, Mitchell L, Giraud-Carrier C, Martinez T (2014) Recommending learning algorithms and their associated hyperparameters. CEUR Workshop Proc 1201:39–40. https://doi.org/10.1145/2487575.2487629
Article Google Scholar
Son J, Kim SB (2017) Content-based filtering for recommendation systems using multiattribute networks. Expert Syst Appl 89:404–412. https://doi.org/10.1016/j.eswa.2017.08.008
Article Google Scholar
Tang TY, Winoto P, Guan A, Chen G (2018) “The foreign language effect” and movie recommendation: a comparative study of sentiment analysis of movie reviews in Chinese and English. ACM Int Conf proceeding Ser 79–84. https://doi.org/10.1145/3195106.3195130
Vu DH, Muttaqi KM, Agalgaonkar AP (2015) A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables. Appl Energy 140:385–394. https://doi.org/10.1016/j.apenergy.2014.12.011
Article Google Scholar
Wang H, Zhang H (2018) Movie genre preference prediction using machine learning for customer base information. 110–116. https://doi.org/10.1109/CCWC.2018.8301647
Wilson DC, Smyth B, Sullivan DO (2003) Sparsity reduction in collaborative recommendation: a case-based approach. Int J Pattern Recognit Artif Intell 17:863–884. https://doi.org/10.1142/s0218001403002678
Article Google Scholar
Xing W, Du D (2018) Dropout prediction in MOOCs: using deep learning for personalized intervention. J Educ Comput Res. https://doi.org/10.1177/0735633118757015
Yamagishi J, Kawai H, Kobayashi T (2008) Phone duration modeling using gradient tree boosting. Speech Commun 50:405–415. https://doi.org/10.1016/j.specom.2007.12.003
Article Google Scholar
Yu L, Liu L, Li X (2005) A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-commerce. Expert Syst Appl 28:67–77. https://doi.org/10.1016/j.eswa.2004.08.013
Article Google Scholar
Zhang W, Skiena S (2009) Improving movie gross prediction through news analysis. Proc - 2009 IEEE/WIC/ACM Int Conf web Intell WI 2009 1:301–304. https://doi.org/10.1109/WI-IAT.2009.53
Zhang L, Luo J, Yang S (2009) Forecasting box office revenue of movies with BP neural network. Expert Syst Appl 36:6580–6587. https://doi.org/10.1016/j.eswa.2008.07.064
Article Google Scholar

Download references

Acknowledgments

The effort of this paper supported by “NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA, grant number 91630206, and 61572434”, and “THE NATIONAL KEY R&D PROGRAM OF CHINA, grant number 2017YFB0701501”.

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, No. 99, Shangda Road, Baoshan Campus, Baoshan District, Shanghai, 200444, China
Syed Muhammad Raza Abidi, Yonglin Xu, Jianyue Ni, Xiangmeng Wang & Wu Zhang
Shanghai Institute of Applied Mathematics and Mechanics, Shanghai University, No. 99, Shangda Road, Baoshan Campus, Baoshan District, Shanghai, 200444, China
Wu Zhang

Authors

Syed Muhammad Raza Abidi
View author publications
You can also search for this author in PubMed Google Scholar
Yonglin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianyue Ni
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Syed Muhammad Raza Abidi or Wu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abidi, S.M.R., Xu, Y., Ni, J. et al. Popularity prediction of movies: from statistical modeling to machine learning techniques. Multimed Tools Appl 79, 35583–35617 (2020). https://doi.org/10.1007/s11042-019-08546-5

Download citation

Received: 24 January 2019
Revised: 09 October 2019
Accepted: 27 November 2019
Published: 06 January 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-019-08546-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Popularity prediction of movies: from statistical modeling to machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Popularity prediction of movies: from statistical modeling to machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation