Skip to main content

Clickbait Detection Based on Word Embedding Models

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 773))

Abstract

In recent years, social networking platform serves as a new media of news sharing and information diffusion. Social networking platform has become a part of our daily life. As such, social media advertising budgets have explosively expanded worldwide over the past few years. Due to the huge commercial interest, clickbait behaviors are commonly observed, which use attractive headlines and sensationalized textual description to bait users to visit websites. Clickbaits mainly exploit the users’ curiosity’s gap by interesting headlines to entice its readers to click an accompanying link to articles often with poor contents. Clickbaits are bothersome either to social media users or platform site owners. In this paper, we propose an approach called Ontology-based LSTM Model (OLSTM) to detect clickbaits. Compared with the existing solutions for clickbait detection, our approach is characterized by the following three components: word embedding model, Recurrent Neural Networks (RNN), and word ontology information. The observation is that preserving semantic relationships is significantly an important factor to be considered in detecting clickbaits. Therefore, we propose to capture semantic relationships between words by word embedding models. In addition, we adopted RNN as our classification models to consider word orders in a sentence. Furthermore, we consider the word ontology relation as another feature set for clickbait classification, as clickbaits often uses words with generalized concepts to induce curiosity. We conduct experiments with real data from Twitter and news websites to validate the effectiveness of the proposed approach, which demonstrates that the employment of the proposed method improves clickbait detection accuracy from 80% to 90% compared with the existing solutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Potthast, M., Gollub, T., Komlossy, K., Schuster, S., Wiegmann, M., Garces, E., Hagen, M., Stein, B.: Crowdsourcing a Large Corpus of Clickbait on Twitter. arXiv: 1710.08721v1 (2017)

  2. Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop Clickbait: detecting and preventing clickbaits in online news media. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2016)

    Google Scholar 

  3. Heartfield, R., Loukas, G., Gan, D.: You are probably not the weakest link: towards practical prediction of susceptibility to semantic social engineering attacks. In: IEEE Access, vol. 4 (2016)

    Article  Google Scholar 

  4. Yang, S., Chen, H., Vorakitphan, V., Fan, Y.: Learning term taxonomy relationship from a large collection of plain text. In: Computer Symposium (ICS) (2016)

    Google Scholar 

  5. Arnold, P., Rahm, E.: Extracting semantic concept relations from wikipedia. In: WIMS 2014 Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), Article No. 26 (2014)

    Google Scholar 

  6. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  7. Ahmed, S., Monzur, R., Palit, R.: Development of a rumor and spam reporting and removal tool for social media. In: 3rd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE) (2016)

    Google Scholar 

  8. Sang, L., Xie, F., Liu, X.: WEFEST: word embedding feature extension for short text classification. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW) (2016)

    Google Scholar 

  9. Wong, W., Lui, W., Bennamoun, M.: Ontology learning from text: a look back and into the future. In: ACM Computing Surveys CSUR, pp. 1–36 (2011)

    Article  Google Scholar 

  10. Fuller, S.: U.S. Social Media Marketing - Statistics Facts (2016)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  12. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, speech and signal processing (ICASSP), pp. 6645–6649 (2013)

    Google Scholar 

Download references

Acknowledgement

This research was supported by the Ministry of Science and Technology Taiwan R.O.C. under grant number 106-2221-E-005-082-, and also partially supported by the Project H367B83300 conducted by ITRI under sponsorship of the Ministry of Economic Affairs, Taiwan, R.O.C.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao-Chung Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vorakitphan, V., Leu, FY., Fan, YC. (2019). Clickbait Detection Based on Word Embedding Models. In: Barolli, L., Xhafa, F., Javaid, N., Enokido, T. (eds) Innovative Mobile and Internet Services in Ubiquitous Computing. IMIS 2018. Advances in Intelligent Systems and Computing, vol 773. Springer, Cham. https://doi.org/10.1007/978-3-319-93554-6_54

Download citation

Publish with us

Policies and ethics