Abstract
Spam detection on social networks, considered a short text classification problem, is a challenging task in natural language processing due to the sparsity and ambiguity of the text. One of the key tasks to address this problem is a powerful text representation. Traditional word embedding models solve the data sparsity problem by representing words with dense vectors, but these models have some limitations that prevent them from handling some problems effectively. The most common limitation is the “out of vocabulary” problem, in which the models fail to provide any vector representation for the words that are not present in the model’s dictionary. Another problem these models face is the independence from the context, in which the models output just one vector for each word regardless of the position of the word in the sentence. To overcome these problems, we propose to build a new model based on deep contextualized word representation, consequently, in this study, we develop CBLSTM (Contextualized Bi-directional Long Short Term Memory neural network), a novel deep learning architecture based on bidirectional long short term neural network with embedding from language models, to address the spam texts problem on social networks. The experimental results on three benchmark datasets show that our proposed method achieves high accuracy and outperforms the existing state-of-the-art methods to detect spam on social networks.
Similar content being viewed by others
References
Aiyar S, Shetty NP (2018) N-gram assisted youtube spam comment detection. Procedia Comput Sci 132:174–182
Alberto TC, Lochter JV, Almeida TA (2015) Tubespam: Comment spam filtering on youtube. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE
Almeida TA, Hidalgo JMG, Yamakami A (2011) Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM symposium on Document engineering
Ameen AK, Kaya B (2018) Spam detection in online social networks by deep learning. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE
Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Appl 32(9):4239–4257
Chaudhary V, Sureka A (2013) Contextual feature based one-class classifier approach for detecting video response spam on Youtube. In: Eleventh Annual Conference on Privacy, Security and Trust. IEEE
Chen W et al (2015) Real-time twitter content polluter detection based on direct features. In: 2015 2nd International Conference on Information Science and Security (ICISS). IEEE
Chen W et al (2017) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS One 12(8):e0182487
Chowdury R et al (2013) A data mining based spam detection system for youtube. In: Eighth International Conference on Digital Information Management (ICDIM). IEEE
Egele M et al (2015) Towards detecting compromised accounts on social networks. IEEE Trans Dependable Secur Comput 14(4):447–460
El-Mawass N, Alaboodi S (2015) Hunting for spammers: Detecting evolved spammers on twitter. arXiv preprint arXiv:1512.02573
Gao Y et al (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval
Gupta H et al (2018) A framework for real-time spam detection in Twitter. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS). IEEE
Hidalgo JMG, Zurutuza U (2017) Novel comment spam filtering method on Youtube: sentiment analysis and personality recognition. In: Current Trends in Web Engineering: ICWE 2017 International Workshops, Liquid Multi-Device Software and EnWoT, practi-O-web, NLPIT, SoWeMine, Rome, Italy, June 5–8, Revised Selected Papers. Springer, Berlin
Ilić S et al (2018) Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795
Jain G, Sharma M, Agarwal B (2019) Optimizing semantic LSTM for spam detection. Int J Inform Technol 11(2):239–250
Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85(1):21–44
Kandasamy K, Koroth P (2014) An integrated approach to spam classification on Twitter using URL analysis, natural language processing and machine learning techniques. In: 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science. IEEE
Kanodia S, Sasheendran R, Pathari V (2018) A novel approach for youtube video spam detection using markov decision process. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE
Lee S, Kim J (2014) Early filtering of ephemeral malicious accounts on Twitter. Comput Commun 54:48–57
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in Twitter. IEEE Trans Comput Social Syst 5(4):973–984
Mateen M et al (2017) A hybrid approach for spam detection for Twitter. In: 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST). 2017. IEEE
McCann B et al (2017) Learned in translation: Contextualized word vectors. arXiv preprint arXiv:1708.00107
Miller Z et al (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Peters ME et al (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Samsudin NAM et al (2019) Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J Electr Eng Comput Sci 14(3):1508–1517
Tur G, Homsi MN (2017) Cost-sensitive classifier for spam detection on news media Twitter accounts. In: 2017 XLIII Latin American Computer Conference (CLEI). IEEE
Uysal AK (2018) Feature selection for comment spam filtering on YouTube. Data Sci Appl 1(1):4–8
Wang F et al (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73
Wang J et al (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI
Watcharenwong N, Saikaew K (2017) Spam detection for closed Facebook groups. In: 14th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE
Wu T et al (2017) Detecting spamming activities in twitter based on deep-learning technique. Concurr Comput Pract Exp 29(19):e4209
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
Zhao S et al (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231
Zheng X et al (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None of the authors of this paper has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Razan Ghanem's alternative name is Rezan Bakir.
Rights and permissions
About this article
Cite this article
Ghanem, R., Erbay, H. Spam detection on social networks using deep contextualized word representation. Multimed Tools Appl 82, 3697–3712 (2023). https://doi.org/10.1007/s11042-022-13397-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13397-8