Skip to main content
Log in

A three-layer back-propagation neural network for spam detection using artificial immune concentration

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, a three-layer back-propagation neural network (BPNN) is employed for spam detection by using a concentration based feature construction (CFC) approach. In the CFC approach, ‘self’ and ‘non-self’ concentrations are constructed through ‘self’ and ‘non-self’ gene libraries, respectively, to form a two-element concentration vector for expressing the e-mail efficiently. A three-layer BPNN with two-element input is then employed to classify e-mails automatically. Comprehensive experiments are conducted on two public benchmark corpora PU1 and Ling to demonstrate that the proposed CFC approach based BPNN classifier not only has a very much fast speed but also achieves 97 and 99% of classification accuracy on corpora PU1 and Ling by just using a two-element concentration feature vector.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The PU1 corpus and Ling corpus may be downloaded from http://www.cil.pku.edu.cn/resources/.

References

  • Androutsopoulos I, Koutsias J, Chandrinos KV, Spyropoulos CD (2000a) An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd ACM SIGIR conference on research and development in information retrieval, pp 160–167

  • Androutsopoulos I, Koutsias J, Chandrinos KV, Paliouras G, Spyropoulos CD (2000b) An evaluation of Naive Bayesian anti-spam filtering. In: Proceedings of European conference on machine learning (ECML 2000)

  • Bhattacharyya M, Schultz M (2002) MET: an experimental system for Malicious email tracking. In: Proceedings of new security paradigms workshop, pp 3–10

  • Brendel R, Krawczyk H (2007) Detection methods of dynamic spammers’ behavior. In: International conference on dependability of computer systems, pp 145–152

  • Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Proceedings of IEEE international conference on web intelligence (WI 2003), pp 702–705

  • Drucker H, Wu DH, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10:1048–1054

    Google Scholar 

  • Gunal S, Ergin S, Gulmezoglu MB, Gerek ON (2006) On feature extraction for spam e-mail detection. Lectures Notes on Computer Science. Springer, Berlin, pp 635–642

  • Katakis I, Tsoumakas G, Vlahavas I (2006) Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Proceedings of international workshop on knowledge discovery from data streams, pp 107–116

  • Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inf Sci, pp 2167–2187

  • Leiba B, Borenstein N (2004) A multifaceted approach to spam reduction. In: Proceedings of the first conference on email and antispam (CEAS 2004)

  • Li Y, Fang BX, Guo L, Wang S (2006) Research of a novel anti-spam technique based on users’s feedback and improved Naive Bayesian approach. In: Proceedings of IEEE international conference on networking and services (ICNS 2006), pp 86–91

  • Oda T, White T (2003) Increasing the accuracy of a spam-detecting artificial immune system. In: Proceedings of IEEE congress on evolutionary computation (CEC 2003), pp 390–396

  • Oda T, White T (2005) Immunity from spam: an analysis of an artificial immune system for junk email detection. In: International conference on artificial immune systems (ICARIS 2005)

  • Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics, pp 55–67

  • Rigoutsos I, Huynh T (2004) Chung–Kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (SPAM). In: Proceedings of the first conference on email and antispam (CEAS 2004)

  • Ruan GC, Tan Y (2007) Intelligent detection approaches for spam. In: The third international conference on natural computation (ICNC 2007) vol 3, August 24–27, Haikou, China, pp 672–676

  • Ruan GC, Tan Y (2008) Uninterrupted approaches for spam detection based on SVM and AIS. IEEE Trans Syst Man Cybern B

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors, Nature, pp 533–536

  • Secker A, Freitas AA, Timmis J (2003) AISEC: an artificial immune system for email classification. In: Proceedings of IEEE congress on evolutionary computation (CEC 2003), pp 131–139

  • Shrestha R, Lin YP (2005) Improved bayesian spam filtering based on co-weighted multi-area information, Lecture Notes in Artificial Intelligence. Springer, Berlin, pp 650–660

  • Stolfo S, Hershkop S (2006) Behavior-based modeling and its application to email analysis. ACM Trans Internet Technol 6:187–221

    Article  Google Scholar 

  • Stuart I, Cha SH, Tappert C (2004) A neural network classifier for junk e-mail. Lecture Notes on Computer Science. Springer, Berlin, pp 442–450

  • Tan Y, Ruan GC (2007) Recognition of electronic junk mails based on artificial immune system. In: The third joint workshop on machine perception and robotics (MPR 2007), Nov 25–27, Ritsumeikan University, Japan

  • Tan Y, Wang J (2004) A support vector network with hybrid Kernel and minimal Vapnik–Chervonenkis dimension. IEEE Trans Knowl Data Eng pp 385–395

  • Wang R, Youssef AM, Elhakeem AK (2006) On some feature selection strategies for spam filter design. In: Proceedings of Canadian conference on electrical and computer engineering, pp 2186–2189

  • Wu MW, Huang Y, Lu SK, Chen IY, Kuo SY (2005) A multi-faceted approach towards spam-resistible mail. In: Proceedings of IEEE Pacific Rim international symposium on dependable computing, pp 208–218

  • Yeh CY, Wu CH, Doong SH (2005) Effective spam classification based on meta-heuristics. In: Proceedings of IEEE international conference on systems, man and cybernetics, pp 3872–3877

Download references

Acknowledgments

This work was supported by the National High Technology Research and Development Program of China (863 Program), with grant number 2007AA01Z453, and partially supported by National Natural Science Foundation of China (NSFC), under grant number 60673020 and 60875080. Authors would like to highly appreciate editor and three anonymous referees for their insightful comments and suggestions, which greatly help to improve the quality and presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Tan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, G., Tan, Y. A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Comput 14, 139–150 (2010). https://doi.org/10.1007/s00500-009-0440-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-009-0440-2

Keywords

Navigation