Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest

Kihal, Marouane; Hamza, Lamia

doi:10.1007/s11042-023-15170-x

Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest

Published: 03 April 2023

Volume 82, pages 40819–40837, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

299 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Nowadays, there is a growing demand among Internet and social media users for improved protection against spam. Despite numerous studies focused on spam detection, no contribution has addressed filtering text, image, audio, and video modalities of multimedia content simultaneously. In view of this situation, we present in this paper a new deep multimodal decision-level fusion system that could effectively detect multimedia spam. Our proposed system employs Convolutional Neural Networks (CNN) for feature extraction and selection. The retrieved features are organized into three independent vectors, namely visual, textual, and audio (VTA) vectors, to attain a strong content representation. Each vector is then individually fed into a Random Forest (RF) model for further analysis and classification. Thus, we have called our model VTA-CNN-RF. We show that our model overcomes seven Machine Learning (ML) algorithms in each of the three types of VTA information. Additionally, our study involved experiments demonstrating the fusion’s advantages on the system’s overall performance. Our results indicate a precision rate of 99.08% on a publicly available hybrid dataset that includes text and image content and 98.20% on a composite multimedia dataset. The proposed VTA-CNN-RF model provides superior spam identification compared to previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multimodal Spam Filtering System for Multimedia Messaging Service

DeepCapture: Image Spam Detection Using Deep Learning and Data Augmentation

Effective Spam Image Classification Using CNN and Transfer Learning

Data Availability

We have not associated any data, and we have given the references for the publicly available datasets mentioned in the paper.

References

AZ AlaM, Faris H, Alqatawna J F, Hassonah M A (2018) Evolving support vector machines using whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowl-Based Syst 153:91–104. https://doi.org/10.1016/j.knosys.2018.04.025
Article Google Scholar
Abid M A, Ullah S, Siddique M A, Mushtaq M F, Aljedaani W, Rustam F (2022) Spam sms filtering based on text features and supervised machine learning techniques. Multimed Tools Appl:1–19. https://doi.org/10.1007/s11042-022-12991-0
Adewole K S, Han T, Wu W, Song H, Sangaiah A K (2020) Twitter spam account detection based on clustering and classification methods. J Supercomput 76(7):4802–4837. https://doi.org/10.1007/s11227-018-2641-x
Article Google Scholar
Aiwan F, Zhaofeng Y (2018) Image spam filtering using convolutional neural networks. Pers Ubiquit Comput 22(5):1029–1037. https://doi.org/10.1007/s00779-018-1168-8
Article Google Scholar
Almeida TA, Hidalgo JMG, Yamakami A (2011) Contributions to the study of sms spam filtering: new collection and results. In: Proceedings of the 11th ACM symposium on Document engineering, pp 259–262. https://doi.org/10.1145/2034691.2034742
Amir A, Srinivasan B, Khan A I (2018) Distributed classification for image spam detection. Multimed Tools Appl 77(11):13249–13278. https://doi.org/10.1007/s11042-017-4944-y
Article Google Scholar
Bazzaz Abkenar S, Mahdipour E, Jameii S M, Haghi Kashani M (2021) A hybrid classification method for twitter spam detection based on differential evolution and random forest. Concurrency Comput: Pract Exp 33(21):e6381. 10.1002/cpe.6381
Article Google Scholar
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc
Chandrasekaran G, Nguyen T N, Hemanth D J (2021) Multimodal sentimental analysis for social media applications: a comprehensive review. Wiley Interdisciplinary Rev: Data Mining Knowl Disc 11(5):e1415. https://doi.org/10.1002/widm.1415
Google Scholar
Cherifi F, Amroun K, Omar M (2021) Robust multimodal biometric authentication on iot device through ear shape and arm gesture. Multimed Tools Appl 80(10):14807–14827. https://doi.org/10.1007/s11042-021-10524-9
Article Google Scholar
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: CEAS, pp 2007–487
Fatichah C, Lazuardi WF, Navastara DA, Suciati N, Munif A (2019) Image spam detection on instagram using convolutional neural network. In: Intelligent and interactive computing. Springer, pp 295–303. https://doi.org/10.1007/978-981-13-6031-2_19
Freeman D M (2013) Using naive bayes to detect spammy names in social networks. In: Proceedings of the 2013 ACM workshop on Artificial intelligence and security, pp 3–12. https://doi.org/10.1145/2517312.2517314
Gao Y, Yang M, Zhao X, et al (2008) Image spam hunter. In: 2008 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 1765–1768. https://doi.org/10.1109/ICASSP.2008.4517972
Ghatasheh N, Altaharwa I, Aldebei K (2022) Modified genetic algorithm for feature selection and hyper parameter optimization: case of XGBoost in spam prediction. IEEE Access 10:84365–84383. https://doi.org/10.1109/ACCESS.2022.3196905
Article Google Scholar
Goyal S, Chauhan R K, Parveen S (2016) Spam detection using KNN and decision tree mechanism in social network. In: 2016 Fourth international conference on parallel, distributed and grid computing (PDGC). IEEE, pp 522–526. https://doi.org/10.1109/PDGC.2016.7913250
Gunawan D, Rahmat R F, Putra A (2018) Pasha MF filtering spam text messages by using twitter-lda algorithm. In: 2018 IEEE international conference on communication, networks and satellite (Comnetsat). IEEE, pp 1-6. https://doi.org/10.1109/COMNETSAT.2018.8684085
Hnini G, Riffi J, Mahraz M A, Yahyaouy A, Tairi H (2021) Mmpc-rf: a deep multimodal feature-level fusion architecture for hybrid spam email detection. Appl Sci 11(24):11,968. https://doi.org/10.3390/app112411968
Article Google Scholar
Jain G, Sharma M, Agarwal B (2019) Optimizing semantic lstm for spam detection. Int J Inf Technol 11(2):239–250. https://doi.org/10.1007/s41870-018-0157-5
Google Scholar
Jogin M, Madhulika M, Divya G, Meghana R K, Apoorva S (2018) Feature extraction using convolution neural networks (cnn) and deep learning
Kanodia S, Sasheendran R, Pathari V (2018) A novel approach for youtube video spam detection using markov decision process. In: 2018 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 60-66. https://doi.org/10.1109/ICACCI.2018.8554405
Khormali A, Yuan J S (2022) Dfdt: an end-to-end deepfake detection framework using vision transformer. Appl Sci 12 (6):2953. https://doi.org/10.3390/app12062953
Article Google Scholar
Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: European conference on machine learning, Springer, pp 217–226. https://doi.org/10.1007/978-3-540-30115-8_22
Krithiga R, Ilavarasan E (2021) Hyperparameter tuning of AdaBoost algorithm for social spammer identification. Int J Pervasive Comput Commun 17(5):462–482. https://doi.org/10.1108/IJPCC-09-2020-0130
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Li Q, Chen S, Tan S, Li B, Huang J (2021) One-class double compression detection of advanced videos based on simple gaussian distribution model. IEEE Trans Circuits Syst Video Technol 32(4):2496–2500. https://doi.org/10.1109/TCSVT.2021.3069254
Article Google Scholar
Li Q, Li P, Mao K, Lo E Y (2020) Improving convolutional neural network for text classification by recursive data pruning. Neurocomputing 414:143–152. https://doi.org/10.1016/j.neucom.2020.07.049
Article Google Scholar
Liu X, Lu H, Nayak A (2021) A spam transformer model for sms spam detection. IEEE Access 9:80,253–80,263. https://doi.org/10.1109/ACCESS.2021.3081479
Article Google Scholar
Makkar A, Kumar N (2021) Protector: an optimized deep learning-based framework for image spam detection and prevention. Future Gen 21 Comput Syst 125:41–58. https://doi.org/10.1016/j.future.2021.06.026
Article Google Scholar
Meel P, Vishwakarma DK (2021) Deep neural architecture for veracity analysis of multimodal online information. In: 2021 11Th international conference on cloud computing, data science & engineering (Confluence). IEEE, pp 7-12. https://doi.org/10.1109/Confluence51648.2021.9377172
Porter M (2008) The porter stemming algorithm, 2005. See https://tartarus.org/martin/PorterStemmer/, Accessed 20 September 2021
Rodríguez-Ortega Y, Ballesteros DM, Renza D (2020) A machine learning model to detect fake voice. In: International conference on applied informatics. Springer, pp 3–13. https://doi.org/10.1007/978-3-030-61702-8_1
Rosita J, Jacob WS (2022) Multi-objective genetic algorithm and cnn- based deep learning architectural scheme for effective spam detection. Int J Intell Netw. https://doi.org/10.1016/j.ijin.2022.01.001
Saidani N, Adi K, Allili M S (2020) A semantic-based classification approach for an enhanced spam detection. Comput Sec 94:101,716. https://doi.org/10.1016/j.cose.2020.101716
Article Google Scholar
Samsudin N M, Foozy C F b M, Alias N, Shamala P, Othman N F, Din W I S W (2019) Youtube spam detection framework using naive bayes and logistic regression. Indonesian J Elect Eng Comput Sci 14(3):1508–1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
Article Google Scholar
Sewagnon G (2019) Development of a computer spam detection over internet telephony model. PhD thesis, Obafemi Awolowo University
Sharmin T, Di Troia F, Potika K, Stamp M (2020) Convolutional neural networks for image spam detection. Inf Sec J: Global Perspect 29 (3):103–117. https://doi.org/10.48550/arXiv.2204.01710
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Sohrabi M K, Karimi F (2018) A feature selection approach to detect spam in the Facebook social network. Arab J Sci Eng 43(2):949–958. 10.1007/s13369-017-2855-x
Article Google Scholar
Statista (2021) Average daily spam volume worldwide from October 2020 to September 2021. https://www.statista.com/statistics/1270424/daily-spam-volume-global/, Accessed 13 June 2022
Steinmetz R (1993) Multimedia technologie-einführung und Grundlagen. Springer, Berlin
Book Google Scholar
Tuli P, Patra J P (2022) Symbol question conversion in structured query language using fuzzy with deep attention based rain lstm. Multimed Tools Appl:1–27. https://doi.org/10.1007/s11042-022-12841-z
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst:30. https://doi.org/10.48550/arXiv.1706.03762
Wang Z, Wei W, Mao X L, Guo G, Zhou P, Jiang S (2022) User-based network embedding for opinion spammer detection. Patt Recognit 125:108,512. https://doi.org/10.1016/j.patcog.2021.108512
Article Google Scholar
Yang H, Liu Q, Zhou S, Luo Y (2019) A spam filtering method based on multi-modal fusion. Appl Sci 9(6):1152. https://doi.org/10.3390/app9061152
Article Google Scholar

Download references

Acknowledgements

This work has been sponsored by the General Directorate for Scientific Research and Technological Development, Ministry of Higher Education and Scientific Research (DGRSDT), Algeria.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Laboratory of Medical Informatics (LIMED), Faculty of Exact Sciences, University of Bejaia, 06000, Bejaia, Algeria
Marouane Kihal & Lamia Hamza

Authors

Marouane Kihal
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Hamza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lamia Hamza.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kihal, M., Hamza, L. Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest. Multimed Tools Appl 82, 40819–40837 (2023). https://doi.org/10.1007/s11042-023-15170-x

Download citation

Received: 16 October 2022
Revised: 13 February 2023
Accepted: 22 March 2023
Published: 03 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-15170-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest

Abstract

Access this article

Similar content being viewed by others

A Multimodal Spam Filtering System for Multimedia Messaging Service

DeepCapture: Image Spam Detection Using Deep Learning and Data Augmentation

Effective Spam Image Classification Using CNN and Transfer Learning

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest

Abstract

Access this article

Similar content being viewed by others

A Multimodal Spam Filtering System for Multimedia Messaging Service

DeepCapture: Image Spam Detection Using Deep Learning and Data Augmentation

Effective Spam Image Classification Using CNN and Transfer Learning

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation