Skip to main content
Log in

Predicting movie Box-office revenues by exploiting large-scale social media content

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Predicting the box-office revenue of a movie before its theatrical release is an important but challenging problem that requires a high level of Artificial Intelligence. Nowadays, social media has shown its predictive power in various domains, which motivates us to exploit social media content to predict box-office revenues. In this study, we employ both linear and non-linear regression models, which are based on the crowd wisdom of social media, especially the posts of users, to predict movie box-office revenues. More specifically, the attention and popularity of the movie, purchase intention of users, and comments of users are automatically mined from social media data. In our model, the use of Linear Regression and Support Vector Regression in predicting the box-office revenue of a movie before its theatrical release is explored. To evaluate the effectiveness of the proposed approach, a cross-validation experiment is conducted. The experimental results show that large-scale social media content is correlated with movie box-office revenues and that the purchase intention of users can lead to more accurate movie box-office revenue predictions. Both the linear and non-linear prediction models have the advantage of predicting movie grosses in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.weibo.com/

  2. http://www.mtime.com/

  3. http://www.imdb.com/

  4. http://movie.douban.com/

  5. http://www.wangpiao.com/

  6. The baidu trends of the director

  7. The baidu trends of the main actors

References

  1. Asur S, Huberman BA (2010) Predicting the future with social media [C]//Web intelligence and intelligent agent technology (WI-IAT), 2010. IEEE/WIC/ACM international conference on IEEE 1:492–499

    Google Scholar 

  2. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market [J]. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  3. Boser B E, Guyon I M, Vapnik V N. (1992) A training algorithm for optimal margin classifiers [C]// Proceedings of the fifth annual workshop on Computational learning theory. ACM, 144–152

  4. Bothos E., Apostolou D., Mentzas G. (2010) Using Social Media to Predict Future Events with Agent-Based Markets. IEEE Intelligent Systems, vol. PP, no. 99.

  5. Chaovalit P, Zhou L. (2005) Movie review mining: A comparison between supervised and unsupervised classification approaches [C]//System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on. IEEE 112c-112c

  6. Chen A (2002) Forecasting gross revenues at the movie box office [J]. University of Washington, Seattle

    Google Scholar 

  7. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  8. Ding X, Liu B, Yu P S. (2008) A holistic lexicon-based approach to opinion mining [C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 231–240.

  9. Drucker H, Burges CJC, Kaufman L et al (1997) Support vector regression machines. J Adv neural inf Process Syst 9:155–161

    Google Scholar 

  10. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent [J]. J Stat Softw 33(1):1

    Article  Google Scholar 

  11. Gayo-Avello D, Metaxas P T, Mustafaraj E. (2011). Limits of electoral predictions using twitter [C]//ICWSM.

  12. Gruhl D, Guha R, Kumar R, et al. (2005) The predictive power of online chatter [C]//Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 78–87

  13. Jansen H J, Koop R. (2006) Pundits, ideologues, and the ranters: The British Columbia election online [J]. Canadian Journal of Communication, 30 (4)

  14. Jansen BJ, Zhang M, Sobel K et al (2009) Twitter power: tweets as electronic word of mouth [J]. J Am Soc Inf Sci Technol 60(11):2169–2188

    Article  Google Scholar 

  15. Joachims T (1999) Making large scale SVM learning practical [J]

    Google Scholar 

  16. Joshi M, Das D, Gimpel K, et al. (2010) Movie reviews and revenues: An experiment in text regression [C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 293–296

  17. Jungherr A, Jürgens P, Schoen H (2012) Why the pirate party won the German election of 2009 or the trouble with predictions: a response to tumasjan, a., sprenger, to, sander, pg, & welpe, im “predicting elections with twitter: what 140 characters reveal about political sentiment”. J Soc Sci Comput Rev 30(2):229–234

    Article  Google Scholar 

  18. Litman BR, Kohl LS (1989) Predicting financial success of motion pictures: The’80s experience [J]. J Media Eco 2(2):35–50

    Article  Google Scholar 

  19. Liviu L, Mihaela T (2011) Predicting product performance with social media. J Nforma Educ 15(2):46–56

    Google Scholar 

  20. Metaxas P T, Mustafaraj E, Gayo-Avello D. (2011) How (not) to predict elections [C]//Privacy, security, risk and trust (PASSAT), 2011 IEEE third international conference on and 2011 I.E. third international conference on social computing (SocialCom). IEEE, 165–171

  21. Mishne G, Glance N S. (2006) Predicting Movie Sales from Blogger Sentiment [C]//AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 155–158.

  22. O’Connor B, Balasubramanyan R, Routledge BR et al (2010) From tweets to polls: linking text sentiment to public opinion time series. J ICWSM 11:122–129

    Google Scholar 

  23. Pang B, Lee L (2008) Opinion mining and sentiment analysis [J]. Found trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  24. Pang B, Lee L, Vaithyanathan S. (2002) Thumbs up?: sentiment classification using machine learning techniques [C]//Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics 79–86

  25. Ritterman J, Osborne M, Klein E. (2009) Using prediction markets and Twitter to predict a swine flu pandemic [C]//1st international workshop on mining social media. 9

  26. Sakaki T, Okazaki M, Matsuo Y. (2010) Earthquake shakes Twitter users: real-time event detection by social sensors [C]//Proceedings of the 19th international conference on World Wide Web. ACM, 851–860

  27. Sawhney MS, Eliashberg J (1996) A parsimonious model for forecasting gross box-office revenues of motion pictures [J]. Mark Sci 15(2):113–131

    Article  Google Scholar 

  28. Schölkopf B, Smola A J. (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond [M]. MIT press

  29. Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks [J]. Expert Syst Appl 30(2):243–254

    Article  Google Scholar 

  30. Sharda R, Meany E. (2000) Forecasting gate receipts using neural network and rough sets [C]//Proceedings of the International DSI Conference. : 1–5

  31. Si J., Mukherjee A., Liu B., Li Q., Li H., Deng X. (2008). Exploiting Topic based Twitter Sentiment for Stock Prediction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2013), pp. 24–29

  32. Simonoff JS, Sparrow IR (2000) Predicting movie grosses: winners and losers, blockbusters and sleepers [J]. Chance 13(3):15–24

    Article  MathSciNet  Google Scholar 

  33. Skoric M, Poor N, Achananuparp P, et al. (2012) Tweets and votes: A study of the 2011 singapore general election [C]//System Science (HICSS), 2012 45th Hawaii International Conference on. IEEE, 2583–2591

  34. Sochay S (1994) Predicting the performance of motion pictures [J]. J Media Eco 7(4):1–20

    Article  Google Scholar 

  35. Sysomos Inc, “An In-Depth Look Inside the Twitter World ”. http://www.sysomos.com/insidetwitter/. [Accessed Feb 3, 2012].

  36. Theil H (1961) Economic forecasts and policy [J]

    Google Scholar 

  37. Tumasjan A, Sprenger T O, Sandner P G, et al. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment[J]. ICWSM, 2010, 10: 178–185

  38. UzZaman N, Blanco R, Matthews M. (2012) TwitterPaul: Extracting and Aggregating Twitter Predictions [J]. arXiv preprint arXiv:1211.6496

  39. Vapnik V. (2000) The nature of statistical learning theory [M]. springer

    Chapter  Google Scholar 

  40. Wikipedia, “social media”. http://en.wikipedia.org/wiki/Social_media

  41. Williams C, Gulati G. (2008) What is a social network worth? Facebook and vote share in the 2008 presidential primaries[C]. American Political Science Association

  42. Zhang L, Luo J, Yang S (2009) Forecasting box office revenue of movies with BP neural network [J]. Expert Syst Appl 36(3):6580–6587

    Article  Google Scholar 

  43. Zhang W, Skiena S. (2009) Improving movie gross prediction through news analysis [C]//Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01. IEEE Computer Society 301–304

  44. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net [J]. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

Ting Liu: model building, experiment design, paper writing

Xiao Ding: model building, experiment design, paper writing

Yiheng Chen: model building, experiment design

Haochen Chen: data collection, experiment design

Maosheng Guo: data collection

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Ding, X., Chen, Y. et al. Predicting movie Box-office revenues by exploiting large-scale social media content. Multimed Tools Appl 75, 1509–1528 (2016). https://doi.org/10.1007/s11042-014-2270-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2270-1

Keywords

Navigation