CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching

Xie, Tianyi; Le, Thai; Lee, Dongwon

doi:10.1007/978-3-030-86517-7_26

Tianyi Xie¹²,
Thai Le¹³ &
Dongwon Lee¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Abstract

Clickbait thumbnails on video-sharing platforms (e.g., YouTube, Dailymotion) are small catchy images that are designed to entice users to click to view the linked videos. Despite their usefulness, the landing videos after click are often inconsistent with what the thumbnails have advertised, causing poor user experience and undermining the reputation of the platforms. In this work, therefore, we aim to develop a computational solution, named as CHECKER, to detect clickbait thumbnails with high accuracy. Due to the fuzziness in the definition of clickbait thumbnails and subsequent challenges in creating high-quality labeled samples, the industry has not coped with clickbait thumbnails adequately. To address this challenge, CHECKER shares a novel clickbait thumbnail dataset and codebase with the industry, and exploits: (1) the weak supervision framework to generate many noisy-but-useful labels, and (2) the co-teaching framework to learn robustly using such noisy labels. Moreover, we also investigate how to detect clickbaits on video-sharing platforms with both thumbnails and titles, and exploit recent advances in vision-language models. In the empirical validation, CHECKER outperforms five baselines by at least 6.4% in F1-score and 4.2% in AUC-ROC. The codebase and dataset from our paper are available at: https://github.com/XPandora/CHECKER.

Part of the work was done while the author visited Penn State during the summer of 2019 as an intern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Clickbait Detection for YouTube Videos

Online Zero-Shot Classification with CLIP

Logo-SSL: Self-supervised Learning with Self-attention for Efficient Logo Detection

Notes

References

Agrawal, A.: Clickbait detection using deep learning. In: NGCT (2016)
Google Scholar
Arpit, D., et al.: A closer look at memorization in deep networks. In: ICML (2017)
Google Scholar
Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: MUTAN: multimodal tucker fusion for visual question answering. In: ICCV (2017)
Google Scholar
Ben-Younes, H., Cadene, R., Thome, N., Cord, M.: BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: AAAI (2019)
Google Scholar
Biyani, P., Tsioutsiouliklis, K., Blackmer, J.: 8 amazing secrets for getting more clicks: detecting clickbaits in news streams using article informality. In: AAAI (2016)
Google Scholar
Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop clickbait: detecting and preventing clickbaits in online news media. In: ASONAM (2016)
Google Scholar
Chen, M.F., et al.: Train and you’ll miss it: interactive model iteration with weak supervision and pre-trained embeddings. arXiv:2006.15168 (2020)
Chen, Y.-C., et al.: UNITER: UNiversal image-TExt representation learning. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7
Chapter Google Scholar
Chen, Y., Conroym, N.J., Rubin, V.L.: Misleading online content: recognizing clickbait as false news. In: ACM on Workshop on Multimodal Deception Detection (2015)
Google Scholar
Elyashar, A., Bendahan, J., Puzis, R.: Detecting clickbait in online social media: you won’t believe how we did it’. arXiv:1710.06699 (2017)
Fu, D., Chen, M., Sala, F., Hooper, S., Fatahalian, K., Ré, C.: Fast and three-rious: speeding up weak supervision with triplet methods. In: ICML (2020)
Google Scholar
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv:1606.01847 (2016)
Han, B., et al.: Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: NIPS (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Le, T., Shu, K., Molina, M.D., Lee, D., Sundar, S.S., Liu, H.: 5 sources of clickbaits you should know! using synthetic clickbaits to improve prediction and distinguish between bot-generated and human-written headlines. In: ASONAM (2019)
Google Scholar
Li, L.H., Yatskar, M., Yin, D., Hsieh, C.-J., Chang, K.-W.: VisualBERT: a simple and performant baseline for vision and language. arXiv:1908.03557 (2019)
Molina, M., Sundar, S.S., Roy, M.M.U., Hassan, N., Le, T., Lee, D.: Does clickbait actually attract more clicks? Three clickbait studies you must read. In: CHI (2021)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Qu, J., Hißbach, A.M., Gollub, T., Potthast, M.: Towards crowdsourcing clickbait labels for YouTube videos. In: HCOMP (2018)
Google Scholar
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Rony, M.M.U., Hassan, N., Yousuf, M.: Diving deep into clickbaits: who use them to what extents in which topics with what effects?. In: ASONAM (2017)
Google Scholar
Shang, L., Zhang, D.Y., Wang, M., Lai, S., Wang, D.: Towards reliable online clickbait video detection: a content-agnostic approach. Knowl. Based Syst. 182, 104851 (2019)
Article Google Scholar
Shu, K., Wang, S., Le, T., Lee, D., Liu, H.: Deep headline generation for clickbait detection. In: ICDM (2018)
Google Scholar
Tan, H., Mohit B.: LXMERT: learning cross-modality encoder representations from transformers. In: EMNLP (2019)
Google Scholar
Xu, P., et al.: Clickbait? Sensational headline generation with auto-tuned reinforcement learning. In: EMNLP (2019)
Google Scholar
Yu, Z., Yu, J., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans Neural Netw. Learn. Syst. 29, 5947–5959 (2018)
Article Google Scholar
Zannettou, S., Chatzis, S., Papadamou, K., Sirivianos, M.: The good, the bad and the bait: detecting and characterizing clickbait on Youtube. In: IEEE Security and Privacy Workshops (SPW) (2018)
Google Scholar

Download references

Acknowledgement

The works of Thai Le and Dongwon Lee were in part supported by NSF awards #1742702, #1820609, #1909702, #1915801, #1934782, and #2114824.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Tianyi Xie
The Pennsylvania State University, University Park, USA
Thai Le & Dongwon Lee

Authors

Tianyi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Thai Le
View author publications
You can also search for this author in PubMed Google Scholar
Dongwon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianyi Xie .

Editor information

Editors and Affiliations

Facebook AI, Seattle, WA, USA
Yuxiao Dong
Torre Telefonica, Barcelona, Spain
Nicolas Kourtellis
Bielefeld University, CITEC, Bielefeld, Germany
Barbara Hammer
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, T., Le, T., Lee, D. (2021). CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-86517-7_26
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)