Skip to main content
Log in

Spam detection on social networks using deep contextualized word representation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Spam detection on social networks, considered a short text classification problem, is a challenging task in natural language processing due to the sparsity and ambiguity of the text. One of the key tasks to address this problem is a powerful text representation. Traditional word embedding models solve the data sparsity problem by representing words with dense vectors, but these models have some limitations that prevent them from handling some problems effectively. The most common limitation is the “out of vocabulary” problem, in which the models fail to provide any vector representation for the words that are not present in the model’s dictionary. Another problem these models face is the independence from the context, in which the models output just one vector for each word regardless of the position of the word in the sentence. To overcome these problems, we propose to build a new model based on deep contextualized word representation, consequently, in this study, we develop CBLSTM (Contextualized Bi-directional Long Short Term Memory neural network), a novel deep learning architecture based on bidirectional long short term neural network with embedding from language models, to address the spam texts problem on social networks. The experimental results on three benchmark datasets show that our proposed method achieves high accuracy and outperforms the existing state-of-the-art methods to detect spam on social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

References

  1. Aiyar S, Shetty NP (2018) N-gram assisted youtube spam comment detection. Procedia Comput Sci 132:174–182

    Article  Google Scholar 

  2. Alberto TC, Lochter JV, Almeida TA (2015) Tubespam: Comment spam filtering on youtube. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE

  3. Almeida TA, Hidalgo JMG, Yamakami A (2011) Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM symposium on Document engineering

  4. Ameen AK, Kaya B (2018) Spam detection in online social networks by deep learning. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE

  5. Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Appl 32(9):4239–4257

    Article  Google Scholar 

  6. Chaudhary V, Sureka A (2013) Contextual feature based one-class classifier approach for detecting video response spam on Youtube. In: Eleventh Annual Conference on Privacy, Security and Trust. IEEE

  7. Chen W et al (2015) Real-time twitter content polluter detection based on direct features. In: 2015 2nd International Conference on Information Science and Security (ICISS). IEEE

  8. Chen W et al (2017) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS One 12(8):e0182487

    Article  Google Scholar 

  9. Chowdury R et al (2013) A data mining based spam detection system for youtube. In: Eighth International Conference on Digital Information Management (ICDIM). IEEE

  10. Egele M et al (2015) Towards detecting compromised accounts on social networks. IEEE Trans Dependable Secur Comput 14(4):447–460

    Article  Google Scholar 

  11. El-Mawass N, Alaboodi S (2015) Hunting for spammers: Detecting evolved spammers on twitter. arXiv preprint arXiv:1512.02573

  12. Gao Y et al (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval

  13. Gupta H et al (2018) A framework for real-time spam detection in Twitter. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS). IEEE

  14. Hidalgo JMG, Zurutuza U (2017) Novel comment spam filtering method on Youtube: sentiment analysis and personality recognition. In: Current Trends in Web Engineering: ICWE 2017 International Workshops, Liquid Multi-Device Software and EnWoT, practi-O-web, NLPIT, SoWeMine, Rome, Italy, June 5–8, Revised Selected Papers. Springer, Berlin

  15. Ilić S et al (2018) Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795

  16. Jain G, Sharma M, Agarwal B (2019) Optimizing semantic LSTM for spam detection. Int J Inform Technol 11(2):239–250

    Article  Google Scholar 

  17. Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85(1):21–44

    Article  Google Scholar 

  18. Kandasamy K, Koroth P (2014) An integrated approach to spam classification on Twitter using URL analysis, natural language processing and machine learning techniques. In: 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science. IEEE

  19. Kanodia S, Sasheendran R, Pathari V (2018) A novel approach for youtube video spam detection using markov decision process. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE

  20. Lee S, Kim J (2014) Early filtering of ephemeral malicious accounts on Twitter. Comput Commun 54:48–57

    Article  Google Scholar 

  21. Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in Twitter. IEEE Trans Comput Social Syst 5(4):973–984

    Article  Google Scholar 

  22. Mateen M et al (2017) A hybrid approach for spam detection for Twitter. In: 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST). 2017. IEEE

  23. McCann B et al (2017) Learned in translation: Contextualized word vectors. arXiv preprint arXiv:1708.00107

  24. Miller Z et al (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73

    Article  Google Scholar 

  25. Peters ME et al (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

  26. Samsudin NAM et al (2019) Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J Electr Eng Comput Sci 14(3):1508–1517

    Article  Google Scholar 

  27. Tur G, Homsi MN (2017) Cost-sensitive classifier for spam detection on news media Twitter accounts. In: 2017 XLIII Latin American Computer Conference (CLEI). IEEE

  28. Uysal AK (2018) Feature selection for comment spam filtering on YouTube. Data Sci Appl 1(1):4–8

    MathSciNet  Google Scholar 

  29. Wang F et al (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73

    Article  Google Scholar 

  30. Wang J et al (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI

  31. Watcharenwong N, Saikaew K (2017) Spam detection for closed Facebook groups. In: 14th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE

  32. Wu T et al (2017) Detecting spamming activities in twitter based on deep-learning technique. Concurr Comput Pract Exp 29(19):e4209

    Article  Google Scholar 

  33. Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293

    Article  Google Scholar 

  34. Zhao S et al (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231

    Article  Google Scholar 

  35. Zheng X et al (2015) Detecting spammers on social networks. Neurocomputing 159:27–34

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Chen et al. [8] Alberto et al. [2] and Almeida et al. [3] for sharing their datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Razan Ghanem.

Ethics declarations

Conflict of interest

None of the authors of this paper has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Razan Ghanem's alternative name is Rezan Bakir.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghanem, R., Erbay, H. Spam detection on social networks using deep contextualized word representation. Multimed Tools Appl 82, 3697–3712 (2023). https://doi.org/10.1007/s11042-022-13397-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13397-8

Keywords

Navigation