BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Ouni, Sarra; Fkih, Fethi; Omri, Mohamed Nazih

doi:10.1007/s13278-022-00970-0

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Original Article
Published: 03 October 2022

Volume 12, article number 144, (2022)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Sarra Ouni¹,
Fethi Fkih² &
Mohamed Nazih Omri¹

554 Accesses
12 Citations
Explore all metrics

Abstract

Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy, precision, recall, \(F1-score\), and \(G-mean\), respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

Notes

https://infolab.tamu.edu/data/.

References

Adewole KS, Han T, Wanqing W, Song H, Sangaiah AK (2020) Twitter spam account detection based on clustering and classification methods. J Supercomput 76(7):4802–4837
Article Google Scholar
Agarwal B, Mittal N (2016a) Machine learning approach for sentiment analysis. In: Prominent feature extraction for sentiment analysis, pp 21–45. Springer
Agarwal B, Mittal N (2016b) Sentiment analysis using conceptnet ontology and context information. In: Prominent feature extraction for sentiment analysis, pp 63–75. Springer. https://doi.org/10.1007/978-3-319-25343-5_5
Ahmad SBS, Rafie M, Ghorabie SM (2021) Spam detection on twitter using a support vector machine and users’ features by identifying their interactions. Multimed Tools Appl 80(8):11583–11605
Article Google Scholar
Ala’M A-Z, Faris H, Alqatawna J, Hassonah MA (2018) Evolving support vector machines using whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowl-Based Syst 153:91–104
Article Google Scholar
Al-Janabi M, de Quincey E, Andras P (2017) Using supervised machine learning algorithms to detect suspicious urls in online social networks. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 1104–1111
Almeida Tiago A, Jurandy A, Akebo Y (2011) Spam filtering: how the dimensionality reduction affects the accuracy of naive bayes classifiers. J Int Serv Appl 1(3):183–200
Article Google Scholar
Alom Z, Carminati B, Ferrari E (2020) A deep learning model for twitter spam detection. Online Soc Netw Media 18:100079
Article Google Scholar
Alshdadi Abdulrahman A, Alghamdi Ahmed S, Ali D, Saqib H (2021) Blog backlinks malicious domain name detection via supervised learning. Int J Seman Web Inf Syst (IJSWIS) 17(3):1–17
Article Google Scholar
Ashour M, Salama C, El-Kharashi MW (2018) Detecting spam tweets using character n-gram features. In: 2018 13th international conference on computer engineering and systems (ICCES), pp 190–195. IEEE
Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Appl 32(9):4239–4257
Article Google Scholar
Benevenuto F, Magno G, Rodrigus T, Almedia V (2010) Detecting spammers on twitter in 7th annual collaboration. In: Electronic messaging, anti-abuse and, spam conference (CEAS), vol 6
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. “ O’Reilly Media, Inc.”,
Biyani YV, Khan RA (2020) Spam detection in social media using machine learning algorithm. Int J Res Appl Sci Eng Technol (IJRASET)
Bosma M, Meij E, Weerkamp W (2012) A framework for unsupervised spam detection in social networking sites. In: European conference on information retrieval, pp 364–375. Springer
Boukhari K, Omri MN (2020) Approximate matching-based unsupervised document indexing approach: application to biomedical domain. Scientometrics 124(2):903–924
Article Google Scholar
Chan Patrick PK, Cheng Y, Yeung Daniel S, Ng Wing WY (2015) Spam filtering for short messages in adversarial environment. Neurocomputing 155:167–176
Article Google Scholar
Chen C, Zhang J, Xie Y, Xiang Y, Zhou W, Hassan MM, AlElaiwi A, Alrubaian M (2015) A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans Comput Soc Syst 2(3):65–76
Article Google Scholar
Chen W, Yeo CK, Lau CT, Lee BS (2017) A study on real-time low-quality content detection on twitter from the users’ perspective. PLoS ONE 12(8):e0182487
Article Google Scholar
Choudhary N, Jain AK (2017) Towards filtering of sms spam messages using machine learning based technique. In: International conference on advanced informatics for computing research, pp 18–30. Springer
Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International conference on applied cryptography and network security, pp 455–472. Springer
Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):1–24
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Fethi F, Nazih OM (2013) Estimation of a priori decision threshold for collocations extraction: an empirical study. Int J Inf Technol Web Eng (IJITWE) 8(3):34–49
Article Google Scholar
Gayathri A, Aswini J, Revathi A (2021) Classification of spam detection using naive bayes algorithm over k-nearest neighbors algorithm based on accuracy. NVEO-Natural Volatiles Essential Oils J| NVEO, pp 8516–8530
Gupta H, Jamal MS, Madisetty S, Desarkar MS (2018) A framework for real-time spam detection in twitter. In 2018 10th international conference on communication systems & networks (COMSNETS), pp 380–383. IEEE
Ilias L, Roussaki I (2021) Detecting malicious activity in twitter using deep learning techniques. Appl Soft Comput 107:107360
Article Google Scholar
Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on twitter. Neurocomputing 315:496–511
Article Google Scholar
Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85(1):21–44
Article Google Scholar
Kanodia S, Sasheendran R, Pathari V (2018) A novel approach for youtube video spam detection using markov decision process. In: 2018 international conference on advances in computing, communications and informatics (ICACCI), pp 60–66. IEEE
Kiliroor CC, Valliyammai C (2019) Social context based naive bayes filtering of spam messages from online social networks. In: Soft computing in data analytics, pp 699–706. Springer
Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: A long-term study of content polluters on twitter. In Fifth international AAAI conference on weblogs and social media
Mabrouk O, Hlaoua L, Omri MN (2021) Exploiting ontology information in fuzzy svm social media profile classification. Appl Intell 51(6):3757–3774
Article Google Scholar
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst 5(4):973–984
Article Google Scholar
Mahmoud R, Belgacem S, Omri MN (2021) Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos. Int J Mach Learn Cybern 12(4):1173–1189
Article Google Scholar
Mahmoud R, Belgacem S, Omri MN (2020) Deep signature-based isolated and large scale continuous gesture recognition approach. J King Saud Univ-Comput Inf Sci
Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000
Article Google Scholar
Menaga D, Revathi S (2020) Deep learning: a recent computing platform for multimedia information retrieval. In: Deep learning techniques and optimization strategies in big data analytics, pp 124–141. IGI Global
Mishne G, Carmel D, Lempel R et al (2005) Blocking blog spam with language model disagreement. In AIRWeb 5:1–6
Google Scholar
Ouni S, Fkih F, Omri MN (2021) Toward a new approach to author profiling based on the extraction of statistical features. Soc Netw Anal Min 11(1):1–16
Article Google Scholar
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Article Google Scholar
Rangel F, Rosso P (2019) Overview of the 7th author profiling task at pan 2019: bots and gender profiling in twitter. In: Working notes papers of the CLEF 2019 evaluation labs volume 2380 of CEUR workshop
Rathore S, Loia V, Park JH (2018) Spamspotter: an efficient spammer detection framework based on intelligent decision support system on facebook. Appl Soft Comput 67:920–932
Article Google Scholar
Reddy KS, Reddy ES (2019) Detecting spam messages in twitter data by machine learning algorithms using cross validation. Int J Innov Technol Explor Eng (IJITEE)
Rojas-Galeano S (2021) Using bert encoding to tackle the mad-lib attack in sms spam detection. arXiv preprint arXiv:2107.06400
Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter sms spam. Futur Gener Comput Syst 102:524–533
Article Google Scholar
Sagnika S, Mishra Bhabani SP, Meher SK (2021) An attention-based cnn-lstm model for subjectivity detection in opinion-mining. Neural Comput Appl 33:17425–17438. https://doi.org/10.1007/s00521-021-06328-5
Santoshi KU, Bhavya SS, Sri YB, Venkateswarlu B (2021) Twitter spam detection using naïve bayes classifier. In: 2021 6th international conference on inventive computation technologies (ICICT), pp 773–777. IEEE
Sedhai S, Sun A (2015) Hspam14: A collection of 14 million tweets for hashtag-oriented spam research. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 223–232
Şenel LK, Utlu I, Yücesoy V, Koc A, Cukur T (2018) Semantic structure and interpretability of word embeddings. IEEE/ACM Trans Audio Speech Lang Process 26(10):1769–1779
Article Google Scholar
Sharmin S, Zaman Z (2017) Spam detection in social media employing machine learning tool for text mining. In: 2017 13th international conference on signal-image technology & internet-based systems (SITIS), pp 137–142. IEEE
Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Futur Gener Comput Syst 81:359–371
Article Google Scholar
Soni S, Roberts K (2021) An evaluation of two commercial deep learning-based information retrieval systems for covid-19 literature. J Am Med Inform Assoc 28(1):132–137
Article Google Scholar
Spirin N, Han J (2012) Survey on web spam detection: principles and algorithms. ACM SIGKDD Explor Newsl 13(2):50–64
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the Australasian computer science week multiconference, pp 1–8
Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 823–831
Yang C, Harkreader R, Guofei G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
Article Google Scholar

Download references

Author information

Authors and Affiliations

MARS Research Laboratory, University of Sousse, 17ES05, Sousse, LR, Tunisia
Sarra Ouni & Mohamed Nazih Omri
Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Fethi Fkih

Authors

Sarra Ouni
View author publications
You can also search for this author in PubMed Google Scholar
Fethi Fkih
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nazih Omri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarra Ouni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ouni, S., Fkih, F. & Omri, M.N. BERT- and CNN-based TOBEAT approach for unwelcome tweets detection. Soc. Netw. Anal. Min. 12, 144 (2022). https://doi.org/10.1007/s13278-022-00970-0

Download citation

Received: 24 February 2022
Revised: 31 August 2022
Accepted: 02 September 2022
Published: 03 October 2022
DOI: https://doi.org/10.1007/s13278-022-00970-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation