Clickbait Detection

Potthast, Martin; Köpsel, Sebastian; Stein, Benno; Hagen, Matthias

doi:10.1007/978-3-319-30671-1_72

Martin Potthast²¹,
Sebastian Köpsel²¹,
Benno Stein²¹ &
…
Matthias Hagen²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

8210 Accesses
86 Citations
2 Altmetric

Abstract

This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajani, S.: A full 63% of buzzfeed’s posts are clickbait (2015). http://keyhole.co/blog/buzzfeed-clickbait/
Beckman, J.: Saved you a click—don’t click on that. I already did (2015). https://twitter.com/savedyouaclick
Blom, J.N., Hansen, K.R.: Click bait: forward-reference as lure in online news headlines. J. Pragmat. 76, 87–100 (2015)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Rocca, J.: Dale-Chall easy word list (2013). http://countwordsworth.com/download/DaleChallEasyWordList.txt
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)
Google Scholar
Eidnes, L.: Auto-generating clickbait with recurrent neural networks (2015). http://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/
El-Arini, K., Tang, J.: News feed FYI: click-baiting (2014). http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/
Gianotto, A.: Downworthy—a browser plugin to turn hyperbolic viral headlines into what they really mean (2014). http://downworthy.snipe.net
Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)
Article MathSciNet Google Scholar
Hagey, K.: Henry Blodget’s Second Act (2011). http://www.wsj.com/articles/SB10000872396390444840104577555180608254796
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Imagga Image Tagging Technology (2015). http://imagga.com
John, G.H., langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of UAI 1995, pp. 338–345 (1995)
Google Scholar
Kempe, R.: Clickbait spoilers—channeling traffic from clickbaiting sites back to reputable providers of original content (2015). http://www.clickbaitspoilers.org
Koechley, P.: Why the title matters more than the talk (2012). http://blog.upworthy.com/post/26345634089/why-the-title-matters-more-than-the-talk
Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010, pp. 441–450 (2010)
Google Scholar
le Cessie, S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)
Article MATH Google Scholar
Loewenstein, G.: The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116(1), 75 (1994)
Article Google Scholar
Mizrahi, A.: HuffPo spoilers—I give in to click-bait so you don’t have to (2015). https://twitter.com/huffpospoilers
NewsWhip Media Tracker (2015). http://www.newswhip.com
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: OSIR @ SIGIR (2006)
Google Scholar
Smith, B.: Why buzzfeed doesn’t do clickbait (2015). http://www.buzzfeed.com/bensmith/why-buzzfeed-doesnt-do-clickbait
Stempeck, M.: Upworthy spoiler—words that describe the links that follow (2015). https://twitter.com/upworthyspoiler
Stone, P.J., Dunphy, D.C., Smith, M.S., Inquirer, T.G.: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Google Scholar
Vijgen, B.: The listicle: an exploring research on an interesting shareable new media phenomenon. Stud. Univ. Babes-Bolyai-Ephemerides 1, 103–122 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Bauhaus-Universität Weimar, Weimar, Germany
Martin Potthast, Sebastian Köpsel, Benno Stein & Matthias Hagen

Authors

Martin Potthast
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Köpsel
View author publications
You can also search for this author in PubMed Google Scholar
Benno Stein
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hagen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Potthast .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Potthast, M., Köpsel, S., Stein, B., Hagen, M. (2016). Clickbait Detection. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_72

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_72
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics