Skip to main content

CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching

  • Conference paper
  • First Online:
  • 1220 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Abstract

Clickbait thumbnails on video-sharing platforms (e.g., YouTube, Dailymotion) are small catchy images that are designed to entice users to click to view the linked videos. Despite their usefulness, the landing videos after click are often inconsistent with what the thumbnails have advertised, causing poor user experience and undermining the reputation of the platforms. In this work, therefore, we aim to develop a computational solution, named as CHECKER, to detect clickbait thumbnails with high accuracy. Due to the fuzziness in the definition of clickbait thumbnails and subsequent challenges in creating high-quality labeled samples, the industry has not coped with clickbait thumbnails adequately. To address this challenge, CHECKER shares a novel clickbait thumbnail dataset and codebase with the industry, and exploits: (1) the weak supervision framework to generate many noisy-but-useful labels, and (2) the co-teaching framework to learn robustly using such noisy labels. Moreover, we also investigate how to detect clickbaits on video-sharing platforms with both thumbnails and titles, and exploit recent advances in vision-language models. In the empirical validation, CHECKER outperforms five baselines by at least 6.4% in F1-score and 4.2% in AUC-ROC. The codebase and dataset from our paper are available at: https://github.com/XPandora/CHECKER.

Part of the work was done while the author visited Penn State during the summer of 2019 as an intern.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://tiny.cc/3jkvtz.

  2. 2.

    https://developers.google.com/youtube/v3.

  3. 3.

    https://cloud.google.com/vision/docs/ocr.

  4. 4.

    http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html.

  5. 5.

    https://github.com/sloria/textblob.

References

  1. Agrawal, A.: Clickbait detection using deep learning. In: NGCT (2016)

    Google Scholar 

  2. Arpit, D., et al.: A closer look at memorization in deep networks. In: ICML (2017)

    Google Scholar 

  3. Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: MUTAN: multimodal tucker fusion for visual question answering. In: ICCV (2017)

    Google Scholar 

  4. Ben-Younes, H., Cadene, R., Thome, N., Cord, M.: BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: AAAI (2019)

    Google Scholar 

  5. Biyani, P., Tsioutsiouliklis, K., Blackmer, J.: 8 amazing secrets for getting more clicks: detecting clickbaits in news streams using article informality. In: AAAI (2016)

    Google Scholar 

  6. Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop clickbait: detecting and preventing clickbaits in online news media. In: ASONAM (2016)

    Google Scholar 

  7. Chen, M.F., et al.: Train and you’ll miss it: interactive model iteration with weak supervision and pre-trained embeddings. arXiv:2006.15168 (2020)

  8. Chen, Y.-C., et al.: UNITER: UNiversal image-TExt representation learning. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7

    Chapter  Google Scholar 

  9. Chen, Y., Conroym, N.J., Rubin, V.L.: Misleading online content: recognizing clickbait as false news. In: ACM on Workshop on Multimodal Deception Detection (2015)

    Google Scholar 

  10. Elyashar, A., Bendahan, J., Puzis, R.: Detecting clickbait in online social media: you won’t believe how we did it’. arXiv:1710.06699 (2017)

  11. Fu, D., Chen, M., Sala, F., Hooper, S., Fatahalian, K., Ré, C.: Fast and three-rious: speeding up weak supervision with triplet methods. In: ICML (2020)

    Google Scholar 

  12. Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv:1606.01847 (2016)

  13. Han, B., et al.: Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: NIPS (2018)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. Le, T., Shu, K., Molina, M.D., Lee, D., Sundar, S.S., Liu, H.: 5 sources of clickbaits you should know! using synthetic clickbaits to improve prediction and distinguish between bot-generated and human-written headlines. In: ASONAM (2019)

    Google Scholar 

  16. Li, L.H., Yatskar, M., Yin, D., Hsieh, C.-J., Chang, K.-W.: VisualBERT: a simple and performant baseline for vision and language. arXiv:1908.03557 (2019)

  17. Molina, M., Sundar, S.S., Roy, M.M.U., Hassan, N., Le, T., Lee, D.: Does clickbait actually attract more clicks? Three clickbait studies you must read. In: CHI (2021)

    Google Scholar 

  18. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)

    Google Scholar 

  19. Qu, J., Hißbach, A.M., Gollub, T., Potthast, M.: Towards crowdsourcing clickbait labels for YouTube videos. In: HCOMP (2018)

    Google Scholar 

  20. Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB (2017)

    Google Scholar 

  21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  22. Rony, M.M.U., Hassan, N., Yousuf, M.: Diving deep into clickbaits: who use them to what extents in which topics with what effects?. In: ASONAM (2017)

    Google Scholar 

  23. Shang, L., Zhang, D.Y., Wang, M., Lai, S., Wang, D.: Towards reliable online clickbait video detection: a content-agnostic approach. Knowl. Based Syst. 182, 104851 (2019)

    Article  Google Scholar 

  24. Shu, K., Wang, S., Le, T., Lee, D., Liu, H.: Deep headline generation for clickbait detection. In: ICDM (2018)

    Google Scholar 

  25. Tan, H., Mohit B.: LXMERT: learning cross-modality encoder representations from transformers. In: EMNLP (2019)

    Google Scholar 

  26. Xu, P., et al.: Clickbait? Sensational headline generation with auto-tuned reinforcement learning. In: EMNLP (2019)

    Google Scholar 

  27. Yu, Z., Yu, J., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans Neural Netw. Learn. Syst. 29, 5947–5959 (2018)

    Article  Google Scholar 

  28. Zannettou, S., Chatzis, S., Papadamou, K., Sirivianos, M.: The good, the bad and the bait: detecting and characterizing clickbait on Youtube. In: IEEE Security and Privacy Workshops (SPW) (2018)

    Google Scholar 

Download references

Acknowledgement

The works of Thai Le and Dongwon Lee were in part supported by NSF awards #1742702, #1820609, #1909702, #1915801, #1934782, and #2114824.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyi Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, T., Le, T., Lee, D. (2021). CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86517-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86516-0

  • Online ISBN: 978-3-030-86517-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics