Modelling and predicting news popularity

Hensinger, Elena; Flaounas, Ilias; Cristianini, Nello

doi:10.1007/s10044-012-0314-6

Modelling and predicting news popularity

Short Paper
Published: 21 December 2012

Volume 16, pages 623–635, (2013)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Elena Hensinger¹,
Ilias Flaounas¹ &
Nello Cristianini¹

1385 Accesses
22 Citations
3 Altmetric
Explore all metrics

Abstract

We explore the problem of learning to predict the popularity of an article in online news media. By “popular” we mean an article that was among the “most read” articles of a given day in the news outlet that published it. We show that this cannot be modelled simply as the binary classification task of separating popular from unpopular articles, thereby assuming that popularity is an absolute property. Instead, we propose to view popularity in the perspective of a competitive situation where the popular articles are those which were the most appealing on that particular day. This leads to the notion of an “appeal” function, to model which we use a linear function in the bag of words representation. The parameters of this linear function are learnt from a training set formed by pairs of documents, one of which was popular and the other which appeared on the same page and date, without becoming popular. To learn the appeal function we use Ranking Support Vector Machines, using data collected from six different outlets over a period of 1 year. We show that our method can predict which articles will become popular, as well as extracting those keywords that mostly affect the appeal function. This also enables us to compare different outlets from the point of view of their readers’ preference patterns. Remarkably, this is achieved using very limited information, namely the textual content of title and description of each article, the page and date of publication, and whether it became popular.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ali O, Flaounas I, De Bie T, Mosdell N, Lewis J, Cristianini N (2010) Automating news content analysis: an application to gender bias and readability, pp 36–43
Bautin M, Ward C, Patil A, Skiena S (2010) Access: news and blog analysis for the social sciences. In: Proceedings of the 19th international conference on World Wide Web (WWW), pp 1229–1232
Billsus D, Pazzani MJ (2007) Adaptive news access. In: The adaptive Web
Boser B, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th conference on computational learning theory (COLT), pp 144–152
Center PR (2010) When technology makes headlines: the media’s double vision about the digital age. Tech. rep., Pew Research Center’s Project for Excellence in Journalism
Chang CC, Lin CJ (2011) LIBSVM a library for support vector machines. ACM Trans Intell Syst Technol 2:271–2727
Article Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Das A, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp 271–280
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representation for text categorization. In: Proceedings of the 7th ACM international conference on information and knowledge management (CIKM), pp 148–155
Flaounas I, Ali O, Lansdall-Welfare T, De Bie T, Mosdell N, Lewis J, Cristianini N (2012) Research methods in the age of digital journalism. Digit Journalism 1:1–15
Flaounas I, Ali O, Turchi M, Snowsill T, Nicart F, De Bie T, Cristianini N (2011) NOAM: news outlets analysis and monitoring system. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, New York, pp 1275–1278
Flaounas I, Turchi M, Ali O, Fyson N, De Bie T, Mosdell N, Lewis J, Cristianini N (2010) The structure of EU mediasphere. PLoS ONE 5:e14243
Flaounas IN, Turchi M, De Bie T, Cristianini N (2009) Inference and validation of networks. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML/PKDD), pp 344–358
Fürnkranz J, Hüllermeier E (2010) Preference learning: an introduction. In: Preference learning. Springer, New York
Gans HJ (2004) Deciding what’s news: a study of CBS evening news, NBC nightly news, Newsweek, and Time, 25th anniversary edition. Northwestern University Press, Evanston
Hensinger E, Flaounas I, Cristianini N (2010) Learning the preferences of news readers with SVM and Lasso ranking. In: Proceedings of the 6th conference on artificial intelligence applications and innovations (AIAI), pp 179–186
Jiang X, Hu Y, Li H (2009) A ranking approach to keyphrase extraction. In: Proceedings of the 32nd international ACM conference on research and development in information retrieval (SIGIR), pp 756–757
Joachims T (1999) Making large-scale SVM learning practical. In: Advances in kernel methods: support vector learning, chap. 11. MIT Press, Cambridge, pp 169–184
Joachims T (2002) Learning to classify text using support vector machines. Kluwer, Berlin
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 33–142
Joachims T, Radlinski F (2007) Search engines that learn from implicit feedback. IEEE Comput 40(8):34–40
Article Google Scholar
Kompan M, Bieliková M (2010) Content-based news recommendation. In: Proceedings of the 11th international conference on E-commerce and web technologies (EC-Web 2010), pp 61–72
Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: Proceedings of the 19th international conference on World Wide Web (WWW), pp 621–630
Lewis D, Yang Y, Rose T, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Lim J (2010) Convergence of attention and prominence dimensions of salience among major online newspapers. J Comput Mediat Commun 15(15):293–313
Article Google Scholar
Linden G (2008) People who read this article also read. Spectrum IEEE 45(3):46–60
Article Google Scholar
Liu B (2007) Web data mining, exploring hyperlinks, contents, and usage data. Springer, New York
Liu J, Dolan P, Pedersen ER (2010) Personalized news recommendation based on click behavior. In: Proceeding of the 14th international conference on intelligent user interfaces (IUI). ACM, New York, pp 31–40
McCreadie RMC, Macdonald C, Ounis I (2010) News article ranking: leveraging the wisdom of bloggers. In: Proceedings of the 9th international conference on computer-assisted information retrieval (RIAO), pp 40–48
Paterson C (ed) (2008) Making online news: the ethnography of new media production. Peter Lang Pub Inc, New York
Phelan O, McCarthy K, Smyth B (2009) Using twitter to recommend real-time topical news. In: Proceedings of the 2009 ACM conference on recommender systems (RecSys 2009), pp 385–388
Porter M (1980) An algorithm for suffix stripping. Program 14:130–137
Article Google Scholar
Sandhaus E (2008) The New York Times annotated corpus. In: Linguistic data consortium. Philadelphia
Schmidt M (2005) Least squares optimization with L1-norm regularization. Project report. http://www.di.ens.fr/mschmidt/Software/lasso.html
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge, MA
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Snowsill T, Flaounas I, De Bie T, Cristianini N (2010) Detecting events in a million New York Times articles. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML/PKDD), pp 615–618
Steinberger R, Pouliquen B, Van der Goot E (2009) An introduction to the Europe media monitor family of applications. In: Information access in a multilingual world—proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR’2009), pp 1–8
Szabó G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288
MathSciNet MATH Google Scholar
Wang C, Zhang M, Ru L, Ma S (2008) Automatic online news topic ranking using media focus and user attention based on aging theory. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM 2008), pp 1033–1042
Wu F, Huberman BA (2008) Popularity, novelty and attention. In: Proceedings of the 9th ACM conference on electronic commerce (EC-2008), pp 240–245
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM conference on research and development in information retrieval (SIGIR), pp 42–49
Yu H (2005) SVM selective sampling for ranking with application to data retrieval. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 354–363

Download references

Acknowledgments

This research was supported by the PASCAL2 Network of Excellence and by the European FP7 project “Complacs” (FP7/2007-2013 under grant agreement no 270327).

Author information

Authors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Bristol, UK
Elena Hensinger, Ilias Flaounas & Nello Cristianini

Authors

Elena Hensinger
View author publications
You can also search for this author in PubMed Google Scholar
Ilias Flaounas
View author publications
You can also search for this author in PubMed Google Scholar
Nello Cristianini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Hensinger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hensinger, E., Flaounas, I. & Cristianini, N. Modelling and predicting news popularity. Pattern Anal Applic 16, 623–635 (2013). https://doi.org/10.1007/s10044-012-0314-6

Download citation

Received: 11 May 2011
Accepted: 22 November 2012
Published: 21 December 2012
Issue Date: November 2013
DOI: https://doi.org/10.1007/s10044-012-0314-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling and predicting news popularity

Abstract

Access this article

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation