Skip to main content

Clickbait Detection

  • Conference paper
Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ajani, S.: A full 63% of buzzfeed’s posts are clickbait (2015). http://keyhole.co/blog/buzzfeed-clickbait/

  2. Beckman, J.: Saved you a click—don’t click on that. I already did (2015). https://twitter.com/savedyouaclick

  3. Blom, J.N., Hansen, K.R.: Click bait: forward-reference as lure in online news headlines. J. Pragmat. 76, 87–100 (2015)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Rocca, J.: Dale-Chall easy word list (2013). http://countwordsworth.com/download/DaleChallEasyWordList.txt

  6. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)

    Google Scholar 

  7. Eidnes, L.: Auto-generating clickbait with recurrent neural networks (2015). http://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/

  8. El-Arini, K., Tang, J.: News feed FYI: click-baiting (2014). http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/

  9. Gianotto, A.: Downworthy—a browser plugin to turn hyperbolic viral headlines into what they really mean (2014). http://downworthy.snipe.net

  10. Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)

    Article  MathSciNet  Google Scholar 

  11. Hagey, K.: Henry Blodget’s Second Act (2011). http://www.wsj.com/articles/SB10000872396390444840104577555180608254796

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  13. Imagga Image Tagging Technology (2015). http://imagga.com

  14. John, G.H., langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of UAI 1995, pp. 338–345 (1995)

    Google Scholar 

  15. Kempe, R.: Clickbait spoilers—channeling traffic from clickbaiting sites back to reputable providers of original content (2015). http://www.clickbaitspoilers.org

  16. Koechley, P.: Why the title matters more than the talk (2012). http://blog.upworthy.com/post/26345634089/why-the-title-matters-more-than-the-talk

  17. Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010, pp. 441–450 (2010)

    Google Scholar 

  18. le Cessie, S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)

    Article  MATH  Google Scholar 

  19. Loewenstein, G.: The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116(1), 75 (1994)

    Article  Google Scholar 

  20. Mizrahi, A.: HuffPo spoilers—I give in to click-bait so you don’t have to (2015). https://twitter.com/huffpospoilers

  21. NewsWhip Media Tracker (2015). http://www.newswhip.com

  22. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: OSIR @ SIGIR (2006)

    Google Scholar 

  23. Smith, B.: Why buzzfeed doesn’t do clickbait (2015). http://www.buzzfeed.com/bensmith/why-buzzfeed-doesnt-do-clickbait

  24. Stempeck, M.: Upworthy spoiler—words that describe the links that follow (2015). https://twitter.com/upworthyspoiler

  25. Stone, P.J., Dunphy, D.C., Smith, M.S., Inquirer, T.G.: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)

    Google Scholar 

  26. Vijgen, B.: The listicle: an exploring research on an interesting shareable new media phenomenon. Stud. Univ. Babes-Bolyai-Ephemerides 1, 103–122 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Potthast .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Potthast, M., Köpsel, S., Stein, B., Hagen, M. (2016). Clickbait Detection. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_72

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics