Abstract
This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ajani, S.: A full 63% of buzzfeed’s posts are clickbait (2015). http://keyhole.co/blog/buzzfeed-clickbait/
Beckman, J.: Saved you a click—don’t click on that. I already did (2015). https://twitter.com/savedyouaclick
Blom, J.N., Hansen, K.R.: Click bait: forward-reference as lure in online news headlines. J. Pragmat. 76, 87–100 (2015)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Rocca, J.: Dale-Chall easy word list (2013). http://countwordsworth.com/download/DaleChallEasyWordList.txt
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)
Eidnes, L.: Auto-generating clickbait with recurrent neural networks (2015). http://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/
El-Arini, K., Tang, J.: News feed FYI: click-baiting (2014). http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/
Gianotto, A.: Downworthy—a browser plugin to turn hyperbolic viral headlines into what they really mean (2014). http://downworthy.snipe.net
Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)
Hagey, K.: Henry Blodget’s Second Act (2011). http://www.wsj.com/articles/SB10000872396390444840104577555180608254796
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Imagga Image Tagging Technology (2015). http://imagga.com
John, G.H., langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of UAI 1995, pp. 338–345 (1995)
Kempe, R.: Clickbait spoilers—channeling traffic from clickbaiting sites back to reputable providers of original content (2015). http://www.clickbaitspoilers.org
Koechley, P.: Why the title matters more than the talk (2012). http://blog.upworthy.com/post/26345634089/why-the-title-matters-more-than-the-talk
Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010, pp. 441–450 (2010)
le Cessie, S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)
Loewenstein, G.: The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116(1), 75 (1994)
Mizrahi, A.: HuffPo spoilers—I give in to click-bait so you don’t have to (2015). https://twitter.com/huffpospoilers
NewsWhip Media Tracker (2015). http://www.newswhip.com
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: OSIR @ SIGIR (2006)
Smith, B.: Why buzzfeed doesn’t do clickbait (2015). http://www.buzzfeed.com/bensmith/why-buzzfeed-doesnt-do-clickbait
Stempeck, M.: Upworthy spoiler—words that describe the links that follow (2015). https://twitter.com/upworthyspoiler
Stone, P.J., Dunphy, D.C., Smith, M.S., Inquirer, T.G.: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Vijgen, B.: The listicle: an exploring research on an interesting shareable new media phenomenon. Stud. Univ. Babes-Bolyai-Ephemerides 1, 103–122 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Potthast, M., Köpsel, S., Stein, B., Hagen, M. (2016). Clickbait Detection. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_72
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)