Abstract
In this paper, we apply Stacked Auto-encoder, one of the main types of deep networks, hot topic of machine learning recently, to spam detection and comprehensively compare its performance with other prevalent machine learning techniques those are commonly used in spam filtering. Experiments were conducted on five benchmark corpora, namely PU1, PU2, PU3, PUA and Enron-Spam. Accuracy and \(F_1\) measure are selected as the main criteria in analyzing and discussing the results. Experimental results demonstrate that Stacked Auto-encoder performs better than Naive Bayes, Support Vector Machine, Decision Tree, Boosting, Random Forest and traditional Artificial Neural Network both in accuracy and \(F_1\) measure, which endows deep learning with application in spam filtering in the real world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cranor, L., LaMacchia, B.: Spam!. Communications of the ACM 41(8), 74–83 (1998)
Research, F.: Spam, spammers, and spam control: A white paper by ferris research. Technical Report (2009)
Corporation, S.: Internet security threat report: 2014. Technical Report (2014)
Cyren: Internet threats trend report: October 2014. Technical Report (2014)
Lugaresi, N.: European union vs. spam: a legal response. In: Proceedings of the First Conference on E-mail and Anti-Spam (2004)
Moustakas, E., Ranganathan, C., Duquenoy, P.: Combating spam through legislation: a comparative analysis of us and european approaches. In: Proceedings of the Second Conference on Email and Anti-Spam, pp. 1–8 (2005)
Marsono, M.N.: Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies. PhD thesis, University of Victoria (2007)
Duan, Z., Dong, Y., Gopalan, K.: Dmtp: Controlling spam through message delivery differentiation. Computer Networks 51(10), 2616–2630 (2007)
Hershkop, S.: Behavior-based email analysis with application to spam detection. PhD thesis, Columbia University (2006)
Sanz, E., Gomez Hidalgo, J., Cortizo Perez, J.: Email spam filtering. Advances in Computers 74, 45–114 (2008)
Heron, S.: Technologies for spam detection. Network Security 2009(1), 11–15 (2009)
Cormack, G.: Email spam filtering: A systematic review. Foundations and Trends in Information Retrieval 1(4), 335–455 (2007)
Carpinter, J., Hunt, R.: Tightening the net: A review of current and next generation spam filtering tools. Computers & security 25(8), 566–578 (2006)
Kotsiantis, S.: Supervised machine learning: A review of classification techniques. Informatica 31, 249–268 (2007)
Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263. ACM (1995)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop Then Conference-, Morgan Kaufmann Publishers, INC, pp. 412–420 (1997)
Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify e-mail. Information Sciences 177(10), 2167–2187 (2007)
Shaw, W.: Term-relevance computations and perfect retrieval performance. Information Processing & Management 31(4), 491–498 (1995)
Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Systems with Applications 36(7), 10206–10222 (2009)
Yerazunis, W.: Sparse binary polynomial hashing and the crm114 discriminator. the Web (2003). http://crm114.sourceforge.net/CRM114paper.html
Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 410–421. Springer, Heidelberg (2004)
Tan, Y., Deng, C., Ruan, G.: Concentration based feature construction approach for spam detection. In: Neural Networks, 2009. IJCNN 2009. International Joint Conference on, pp. 3088–3093. IEEE (2009)
Ruan, G., Tan, Y.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing-A Fusion of Foundations, Methodologies and Applications 14(2), 139–150 (2010)
Zhu, Y., Tan, Y.: Extracting discriminative information from e-mail for spam detection inspired by immune system. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–7. IEEE (2010)
Zhu, Y., Tan, Y.: A local-concentration-based feature extraction approach for spam filtering. IEEE Transactions on Information Forensics and Security 6(2), 486–497 (2011)
Mi, G., Zhang, P., Tan, Y.: A multi-resolution-concentration based feature construction approach for spam filtering. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)
Gao, Y., Mi, G., Tan, Y.: An adaptive concentration selection model for spam detection. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014, Part I. LNCS, vol. 8794, pp. 223–233. Springer, Heidelberg (2014)
Mi, G., Zhang, P., Tan, Y.: Feature construction approach for email categorization based on term space partition. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, vol. 62, pp. 98–105. AAAI Technical Report WS-98-05 (1998)
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. arXiv preprint cs/0109015 (2001)
Clark, J., Koprinska, I., Poon, J.: Linger-a smart personal assistant for e-mail classification. In: Proc. of the 13th Intern. Conference on Artificial Neural Networks (ICANN 2003), Istanbul, Turkey, pp. 26–29, June 2003
Rokach, L.: Ensemble-based classifiers. Artificial Intelligence Review 33(1), 1–39 (2010)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research 11, 3371–3408 (2010)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Advances in neural information processing systems 19, 153 (2007)
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial e-mail. ”DEMOKRITOS”, National Center for Scientific Research (2004)
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes-which naive bayes. In: Third Conference on Email and Anti-Spam (CEAS), vol. 17, pp. 28–69 (2006)
Palm, R.B.: Prediction as a candidate for learning deep hierarchical models of data. Technical University of Denmark, Palm 25 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Zhu, Y., Mi, G., Tan, Y.: Query based hybrid learning models for adaptively adjusting locality. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mi, G., Gao, Y., Tan, Y. (2015). Apply Stacked Auto-Encoder to Spam Detection. In: Tan, Y., Shi, Y., Buarque, F., Gelbukh, A., Das, S., Engelbrecht, A. (eds) Advances in Swarm and Computational Intelligence. ICSI 2015. Lecture Notes in Computer Science(), vol 9141. Springer, Cham. https://doi.org/10.1007/978-3-319-20472-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-20472-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20471-0
Online ISBN: 978-3-319-20472-7
eBook Packages: Computer ScienceComputer Science (R0)