Clickbait Detection Based on Word Embedding Models

Vorakitphan, Vorakit; Leu, Fang-Yie; Fan, Yao-Chung

doi:10.1007/978-3-319-93554-6_54

Clickbait Detection Based on Word Embedding Models

Vorakit Vorakitphan¹⁸,
Fang-Yie Leu¹⁹ &
Yao-Chung Fan¹⁸

Conference paper
First Online: 08 June 2018

1409 Accesses
2 Citations
1 Altmetric

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 773))

Abstract

In recent years, social networking platform serves as a new media of news sharing and information diffusion. Social networking platform has become a part of our daily life. As such, social media advertising budgets have explosively expanded worldwide over the past few years. Due to the huge commercial interest, clickbait behaviors are commonly observed, which use attractive headlines and sensationalized textual description to bait users to visit websites. Clickbaits mainly exploit the users’ curiosity’s gap by interesting headlines to entice its readers to click an accompanying link to articles often with poor contents. Clickbaits are bothersome either to social media users or platform site owners. In this paper, we propose an approach called Ontology-based LSTM Model (OLSTM) to detect clickbaits. Compared with the existing solutions for clickbait detection, our approach is characterized by the following three components: word embedding model, Recurrent Neural Networks (RNN), and word ontology information. The observation is that preserving semantic relationships is significantly an important factor to be considered in detecting clickbaits. Therefore, we propose to capture semantic relationships between words by word embedding models. In addition, we adopted RNN as our classification models to consider word orders in a sentence. Furthermore, we consider the word ontology relation as another feature set for clickbait classification, as clickbaits often uses words with generalized concepts to induce curiosity. We conduct experiments with real data from Twitter and news websites to validate the effectiveness of the proposed approach, which demonstrates that the employment of the proposed method improves clickbait detection accuracy from 80% to 90% compared with the existing solutions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Potthast, M., Gollub, T., Komlossy, K., Schuster, S., Wiegmann, M., Garces, E., Hagen, M., Stein, B.: Crowdsourcing a Large Corpus of Clickbait on Twitter. arXiv: 1710.08721v1 (2017)
Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop Clickbait: detecting and preventing clickbaits in online news media. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2016)
Google Scholar
Heartfield, R., Loukas, G., Gan, D.: You are probably not the weakest link: towards practical prediction of susceptibility to semantic social engineering attacks. In: IEEE Access, vol. 4 (2016)
Article Google Scholar
Yang, S., Chen, H., Vorakitphan, V., Fan, Y.: Learning term taxonomy relationship from a large collection of plain text. In: Computer Symposium (ICS) (2016)
Google Scholar
Arnold, P., Rahm, E.: Extracting semantic concept relations from wikipedia. In: WIMS 2014 Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), Article No. 26 (2014)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Ahmed, S., Monzur, R., Palit, R.: Development of a rumor and spam reporting and removal tool for social media. In: 3rd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE) (2016)
Google Scholar
Sang, L., Xie, F., Liu, X.: WEFEST: word embedding feature extension for short text classification. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW) (2016)
Google Scholar
Wong, W., Lui, W., Bennamoun, M.: Ontology learning from text: a look back and into the future. In: ACM Computing Surveys CSUR, pp. 1–36 (2011)
Article Google Scholar
Fuller, S.: U.S. Social Media Marketing - Statistics Facts (2016)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, speech and signal processing (ICASSP), pp. 6645–6649 (2013)
Google Scholar

Download references

Acknowledgement

This research was supported by the Ministry of Science and Technology Taiwan R.O.C. under grant number 106-2221-E-005-082-, and also partially supported by the Project H367B83300 conducted by ITRI under sponsorship of the Ministry of Economic Affairs, Taiwan, R.O.C.

Author information

Authors and Affiliations

National Chung Hsing University, Taichung, Taiwan
Vorakit Vorakitphan & Yao-Chung Fan
Tung-Hai University, Taichung, Taiwan
Fang-Yie Leu

Authors

Vorakit Vorakitphan
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Yie Leu
View author publications
You can also search for this author in PubMed Google Scholar
Yao-Chung Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao-Chung Fan .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Technical University of Catalonia, Barcelona, Spain
Fatos Xhafa
Department of Computer Science, COMSATS Institute of Information Technology, Islamabad, Pakistan
Nadeem Javaid
Rissho University, Tokyo, Japan
Tomoya Enokido

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vorakitphan, V., Leu, FY., Fan, YC. (2019). Clickbait Detection Based on Word Embedding Models. In: Barolli, L., Xhafa, F., Javaid, N., Enokido, T. (eds) Innovative Mobile and Internet Services in Ubiquitous Computing. IMIS 2018. Advances in Intelligent Systems and Computing, vol 773. Springer, Cham. https://doi.org/10.1007/978-3-319-93554-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-93554-6_54
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93553-9
Online ISBN: 978-3-319-93554-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics