Spam detection on social networks using deep contextualized word representation

Ghanem, Razan; Erbay, Hasan

doi:10.1007/s11042-022-13397-8

Spam detection on social networks using deep contextualized word representation

Published: 14 July 2022

Volume 82, pages 3697–3712, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

1179 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Spam detection on social networks, considered a short text classification problem, is a challenging task in natural language processing due to the sparsity and ambiguity of the text. One of the key tasks to address this problem is a powerful text representation. Traditional word embedding models solve the data sparsity problem by representing words with dense vectors, but these models have some limitations that prevent them from handling some problems effectively. The most common limitation is the “out of vocabulary” problem, in which the models fail to provide any vector representation for the words that are not present in the model’s dictionary. Another problem these models face is the independence from the context, in which the models output just one vector for each word regardless of the position of the word in the sentence. To overcome these problems, we propose to build a new model based on deep contextualized word representation, consequently, in this study, we develop CBLSTM (Contextualized Bi-directional Long Short Term Memory neural network), a novel deep learning architecture based on bidirectional long short term neural network with embedding from language models, to address the spam texts problem on social networks. The experimental results on three benchmark datasets show that our proposed method achieves high accuracy and outperforms the existing state-of-the-art methods to detect spam on social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Fake news detection based on news content and social contexts: a transformer-based approach

Article 30 January 2022

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

Notes

http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

References

Aiyar S, Shetty NP (2018) N-gram assisted youtube spam comment detection. Procedia Comput Sci 132:174–182
Article Google Scholar
Alberto TC, Lochter JV, Almeida TA (2015) Tubespam: Comment spam filtering on youtube. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE
Almeida TA, Hidalgo JMG, Yamakami A (2011) Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM symposium on Document engineering
Ameen AK, Kaya B (2018) Spam detection in online social networks by deep learning. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE
Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Appl 32(9):4239–4257
Article Google Scholar
Chaudhary V, Sureka A (2013) Contextual feature based one-class classifier approach for detecting video response spam on Youtube. In: Eleventh Annual Conference on Privacy, Security and Trust. IEEE
Chen W et al (2015) Real-time twitter content polluter detection based on direct features. In: 2015 2nd International Conference on Information Science and Security (ICISS). IEEE
Chen W et al (2017) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS One 12(8):e0182487
Article Google Scholar
Chowdury R et al (2013) A data mining based spam detection system for youtube. In: Eighth International Conference on Digital Information Management (ICDIM). IEEE
Egele M et al (2015) Towards detecting compromised accounts on social networks. IEEE Trans Dependable Secur Comput 14(4):447–460
Article Google Scholar
El-Mawass N, Alaboodi S (2015) Hunting for spammers: Detecting evolved spammers on twitter. arXiv preprint arXiv:1512.02573
Gao Y et al (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval
Gupta H et al (2018) A framework for real-time spam detection in Twitter. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS). IEEE
Hidalgo JMG, Zurutuza U (2017) Novel comment spam filtering method on Youtube: sentiment analysis and personality recognition. In: Current Trends in Web Engineering: ICWE 2017 International Workshops, Liquid Multi-Device Software and EnWoT, practi-O-web, NLPIT, SoWeMine, Rome, Italy, June 5–8, Revised Selected Papers. Springer, Berlin
Ilić S et al (2018) Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795
Jain G, Sharma M, Agarwal B (2019) Optimizing semantic LSTM for spam detection. Int J Inform Technol 11(2):239–250
Article Google Scholar
Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85(1):21–44
Article Google Scholar
Kandasamy K, Koroth P (2014) An integrated approach to spam classification on Twitter using URL analysis, natural language processing and machine learning techniques. In: 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science. IEEE
Kanodia S, Sasheendran R, Pathari V (2018) A novel approach for youtube video spam detection using markov decision process. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE
Lee S, Kim J (2014) Early filtering of ephemeral malicious accounts on Twitter. Comput Commun 54:48–57
Article Google Scholar
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in Twitter. IEEE Trans Comput Social Syst 5(4):973–984
Article Google Scholar
Mateen M et al (2017) A hybrid approach for spam detection for Twitter. In: 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST). 2017. IEEE
McCann B et al (2017) Learned in translation: Contextualized word vectors. arXiv preprint arXiv:1708.00107
Miller Z et al (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Article Google Scholar
Peters ME et al (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Samsudin NAM et al (2019) Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J Electr Eng Comput Sci 14(3):1508–1517
Article Google Scholar
Tur G, Homsi MN (2017) Cost-sensitive classifier for spam detection on news media Twitter accounts. In: 2017 XLIII Latin American Computer Conference (CLEI). IEEE
Uysal AK (2018) Feature selection for comment spam filtering on YouTube. Data Sci Appl 1(1):4–8
MathSciNet Google Scholar
Wang F et al (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73
Article Google Scholar
Wang J et al (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI
Watcharenwong N, Saikaew K (2017) Spam detection for closed Facebook groups. In: 14th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE
Wu T et al (2017) Detecting spamming activities in twitter based on deep-learning technique. Concurr Comput Pract Exp 29(19):e4209
Article Google Scholar
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
Article Google Scholar
Zhao S et al (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231
Article Google Scholar
Zheng X et al (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Chen et al. [8] Alberto et al. [2] and Almeida et al. [3] for sharing their datasets.

Author information

Authors and Affiliations

Department of Computer Engineering, Kırıkkale University, Kırıkkale, Turkey
Razan Ghanem
Department of Computer Engineering, University of Turkish Aeronautical Association, Ankara, Turkey
Hasan Erbay

Authors

Razan Ghanem
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Erbay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Razan Ghanem.

Ethics declarations

Conflict of interest

None of the authors of this paper has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Razan Ghanem's alternative name is Rezan Bakir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghanem, R., Erbay, H. Spam detection on social networks using deep contextualized word representation. Multimed Tools Appl 82, 3697–3712 (2023). https://doi.org/10.1007/s11042-022-13397-8

Download citation

Received: 02 November 2020
Revised: 06 April 2022
Accepted: 02 July 2022
Published: 14 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13397-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam detection on social networks using deep contextualized word representation

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Fake news detection based on news content and social contexts: a transformer-based approach

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spam detection on social networks using deep contextualized word representation

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Fake news detection based on news content and social contexts: a transformer-based approach

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation