Spam e-mail classification for the Internet of Things environment using semantic similarity approach

Venkatraman, S.; Surendiran, B.; Arun Raj Kumar, P.

doi:10.1007/s11227-019-02913-7

Spam e-mail classification for the Internet of Things environment using semantic similarity approach

Published: 05 June 2019

Volume 76, pages 756–776, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

S. Venkatraman¹,
B. Surendiran¹ &
P. Arun Raj Kumar²

1115 Accesses
28 Citations
Explore all metrics

Abstract

Unauthorized service or product advertising messages sent via electronic mails are called as spam e-mails. Detecting spam e-mail remains a challenging task. Existing countermeasures based on the statistical keyword, conceptual and IP address-based blacklists are not efficient due to difficulty in finding new attack patterns generated by the Internet of Things botnet devices. The other spam detection approaches rely on a hybrid of conceptual knowledge engineering with machine learning techniques. But, modern spammers evade the hybrid techniques through word polysemy and word ambiguity due to the context-sensitive nature of words. In this paper, the integration of Naïve Bayesian classification with conceptual and semantic similarity technique is proposed to combat the ambiguity raised through polysemy in spam detection. To analyse the effectiveness of our approach, the experiments were conducted on benchmark data sets such as Spambase, PU1, Enron corpus, and Ling-spam. From the experimental results, it is evident that our proposed system achieves high accuracy of 98.89% than the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review: machine learning techniques applied to cybersecurity

Article 04 January 2019

Javier Martínez Torres, Carla Iglesias Comesaña & Paulino J. García-Nieto

Fighting against phishing attacks: state of the art and future challenges

Article 17 March 2016

B. B. Gupta, Aakanksha Tewari, … Dharma P. Agrawal

Survey of intrusion detection systems: techniques, datasets and challenges

Article Open access 17 July 2019

Ansam Khraisat, Iqbal Gondal, … Joarder Kamruzzaman

References

The History of Spam. Switzerland (2014). https://www.internetsociety.org/sites/default/files/HistoryofSpam.pdf
Robertson J. E-mail spam goes artisanal. http://www.bloomberg.com/news/articles/2016-01-19/E-mail-spam-goes-artisanal
Siponen M, Stucke C (2006) Effective anti-spam strategies in companies: an international study. In: Proceedings of HICSS’06, vol 6
Bueti MC (2005) ITU survey on Anti_Spam Legistation Worldwide. WSIS Thematic Meeting on Cybersecurity, Document CYB/06, Geneva
Swindle O (2003) Statement before the House Subcommittee on Commerce, et all. Federal Trade Commission. June 11, 2003
Kaspersky Lab reports significant increase in malicious spam e-mails in Q1 2016. http://usa.kaspersky.com/about-us/press-center/press-releases/2016/Kaspersky-Lab-Reports-Significant-Increase-in-Malicious-Spam-E-mails-in-Q1-2016
Li CH, Huang JX (2012) Spam filtering using semantic similarity approach and adaptive BPNN. Neurocomputing 92:88–97
Article Google Scholar
Nasir JA, Varlamis I, Karim A, Tsatsaronis G (2013) Semantic smoothing for text clustering. Knowl-Based Syst 54:216–229
Article Google Scholar
Amayri O, Bouguila N (2010) A study of spam filtering using support vector machines. Artif Intell Rev 34:73–108
Article Google Scholar
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive bayes—which naive bayes? In: Third Conference on E-Mail and Anti-Spam (CEAS)
Awad WA, Elseuofi SM (2011) Machine Learning methods for E-mail classification. Int J Comput Appl 16(1):39–45. https://doi.org/10.5120/1974-2646
Article Google Scholar
Zhang Y, Wang S, Phillips P, Ji G (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 64:22–31
Article Google Scholar
Sarafijanovic S, Boudec JL (2008) Artificial immune system for collaborative spam filtering. In: Proceedings of NICSO 2007, The Second Workshop on Nature Inspired Cooperative Strategies for Optimization
Delany SJ, Cunningham P, Tsymbal A, Coyle L (2005) A case-based technique for tracking concept drift in spam filtering. Knowl-Based Syst 18:187–195
Article Google Scholar
Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence
Elssied NOF, Ibrahim O, Osman AH (2015) Enhancement of spam detection mechanism based on hybrid k-mean clustering and support vector machine. Soft Comput 19(11):3237–3248
Article Google Scholar
Eyharabide V, Amandi A (2008) Semantic spam filtering from personalized ontologies. J Web Eng 7(2):158–176
Google Scholar
Sculley D, Wachman GM, Brodley CE (2006) Spam filtering using inexact string matching in explicit feature space with on-line linear classifiers. In: Proceedings of Fifteenth Text Retrieval Conference, Section 2
Dai Y, Tada S, Ban T, Nakazato J, Shimamura J (2014) Detecting malicious spam mails: an online machine learning approach. In: 21st International Conference on Neural Information Processing (ICONIP), pp 365–372
Chapter Google Scholar
Perez-Diaz N, Ruano-Ordas D, Fdez-Riverola F, Mendez JR (2016) Boosting accuracy of classical machine learning antispam classifiers in real scenarios by applying rough set theory. Sci Program 2016:1–11
Google Scholar
Zhou B, Yao Y, Luo J (2014) Cost-sensitive three-way E-mail spam filtering. J Intell Inf Syst 42(1):19–45
Article Google Scholar
Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Proceedings of 3rd IEEE International Conference on Data Mining (ICDM03), Melbourne, FL, pp 541–544
Hu W, Du J, Xing Y (2016) Spam filtering by semantics-based text classification. In: Proceedings of the 8th International Conference on Advanced Computational Intelligence, pp 89–94
Stolfo S, Hershkop S (2006) Behavior-based modeling and its application to E-mail analysis. ACM Trans Internet Technol 6:187–221
Article Google Scholar
Yeh CY, Wu CH, Doong SH (2005) Effective spam classification based on meta-heuristics. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 3872–3877
Brendel R, Krawczyk H (2007) Detection methods of dynamic spammers behavior. In: International conference on dependability of computer systems, pp. 145–152
Hsiao WF, Chang TM (2008) An incremental cluster-based approach to spam filtering. Expert Syst Appl 34(3):1599–1608
Article Google Scholar
Haidar AA, Rocha LM (2008) Adaptive spam detection inspired by a cross-regulation model of immune dynamics: a study of concept drift. Lecture notes in computer science, vol 5132. Springer, Berlin
Google Scholar
Shih DH, Chiang HS, Lin B (2008) Collaborative spam filtering with heterogeneous agents. Expert Syst Appl 34(4):1555–1566
Article Google Scholar
Yih WT, Goodman J, Hulton G (2006) Learning at low false positive rates. In: Proceedings of the Third Conference on E-mail and Anti-Spam
Wikipedia dataset https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
The Enron corpus http://www.edrm.net/resources/data-sets/edrm-enron-E-mail-data-set/
The PU corpora http://www.iit.demokritos.gr/skel/i-config/
The Spambase dataset https://archive.ics.usci.edu/ml/datasets/spambase
The Ling-Spam dataset http://csmining.org/index.php/ling-spam-datasets.html
https://www.rstudio.com/
Blanzieri E, Bryl A (2008) A survey of learning-based techniques of E-mail spam filtering. Artif Intell Rev 29(1):63–92. https://doi.org/10.1007/s10462-009-9109-6
Article Google Scholar
Bin X, Ruiguang L, Yashu L, Hanbing Y, Siyuan L, Honggang Z (2015) Filtering Chinese image spam using Pseudo-OCR. Chin J Electron 24(1):134–139
Article Google Scholar
Wang J, Herath T, Chen R, Vishwanath A, Rao HR (2012) Phishing susceptibility: an investigation into the processing of a targeted spear phishing E-mail. IEEE Trans Prof Commun 55(4):345–362
Article Google Scholar
Jung JJ (2009) Towards collaborative spam filtering based on collective intelligence. In: First Asian Conference on Intelligent Information and Database Systems, pp 356–361
Chirita PA, Nejdl W, Zamfir C (2005) Preventing shilling attacks in online recommender systems. In: Proceedings of the Seventh ACM International Workshop on Web Information and Data Management
Hau X, Lee PN, Jung JJ, Sadeghi-niaraki A (2013) Collaborative spam filtering based on incremental ontology learning. Telecommun Syst 52:693–700
Google Scholar
Zhong Z, Ramaswamy L, Li K (2008) ALPACAS : a large-scale privacy-aware collaborative anti-spam system. In: INFOCOM. The 27th IEEE Conference on Computer Communications. https://doi.org/10.1109/infocom.2008.104
Cunningham P, Nowlan N, Delany SJ, Haahr M (1994) A case-based approach to spam filtering that can track concept drift. no. Ml
Xu H, Yu B (2010) Automatic thesaurus construction for spam filtering using revised back propagation neural network. Expert Syst Appl 37(1):18–23
Article Google Scholar
Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330
Article Google Scholar
Bahgat EM, Moawad IF (2016) Semantic-based feature reduction approach for E-mail classification. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, pp 53–63
Google Scholar
Hu W, Du J, Xing Y (2016) Spam filtering by semantics-based text classification. In: Proceedings of the 8th International Conference on Advanced Computational Intelligence, ICACI, pp 89–94
Han A, Kim H, Ha I, Jo G (2008) Semantic analysis of user behaviors for detecting spam mail. In: IEEE International Workshop on Semantic Computing and Applications, pp 91–95

Download references

Acknowledgements

This work partially supported by Ministry of Human Resource and Development (MHRD), New Delhi, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Puducherry, India
S. Venkatraman & B. Surendiran
Department of Computer Science and Engineering, National Institute of Technology, Calicut, India
P. Arun Raj Kumar

Authors

S. Venkatraman
View author publications
You can also search for this author in PubMed Google Scholar
B. Surendiran
View author publications
You can also search for this author in PubMed Google Scholar
P. Arun Raj Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Venkatraman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Venkatraman, S., Surendiran, B. & Arun Raj Kumar, P. Spam e-mail classification for the Internet of Things environment using semantic similarity approach. J Supercomput 76, 756–776 (2020). https://doi.org/10.1007/s11227-019-02913-7

Download citation

Published: 05 June 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11227-019-02913-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam e-mail classification for the Internet of Things environment using semantic similarity approach

Abstract

Access this article

Similar content being viewed by others

Review: machine learning techniques applied to cybersecurity

Fighting against phishing attacks: state of the art and future challenges

Survey of intrusion detection systems: techniques, datasets and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spam e-mail classification for the Internet of Things environment using semantic similarity approach

Abstract

Access this article

Similar content being viewed by others

Review: machine learning techniques applied to cybersecurity

Fighting against phishing attacks: state of the art and future challenges

Survey of intrusion detection systems: techniques, datasets and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation