A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering

Abawajy, Jemal; Kelarev, Andrei

doi:10.1007/978-3-642-35362-8_5

Jemal Abawajy¹⁹ &
Andrei Kelarev^19,20

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7672))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

2501 Accesses
6 Citations

Abstract

This paper is devoted to multi-tier ensemble classifiers for the detection and filtering of phishing emails. We introduce a new construction of ensemble classifiers, based on the well known and productive multi-tier approach. Our experiments evaluate their performance for the detection and filtering of phishing emails. The multi-tier constructions are well known and have been used to design effective classifiers for email classification and other applications previously. We investigate new multi-tier ensemble classifiers, where diverse ensemble methods are combined in a unified system by incorporating different ensembles at a lower tier as an integral part of another ensemble at the top tier. Our novel contribution is to investigate the possibility and effectiveness of combining diverse ensemble methods into one large multi-tier ensemble for the example of detection and filtering of phishing emails. Our study handled a few essential ensemble methods and more recent approaches incorporated into a combined multi-tier ensemble classifier. The results show that new large multi-tier ensemble classifiers achieved better performance compared with the outcomes of the base classifiers and ensemble classifiers incorporated in the multi-tier system. This demonstrates that the new method of combining diverse ensembles into one unified multi-tier ensemble can be applied to increase the performance of classifiers if diverse ensembles are incorporated in the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

APWG: Anti-Phishing Working Group, http://apwg.org/ (accessed June 10, 2012)
Beliakov, G., Yearwood, J., Kelarev, A.: Application of rank correlation, clustering and classification in information security. Journal of Networks 7, 935–955 (2012)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Article Google Scholar
Dazeley, R., Yearwood, J.L., Kang, B.H., Kelarev, A.V.: Consensus Clustering and Supervised Classification for Profiling Phishing Emails in Internet Commerce Security. In: Kang, B.-H., Richards, D. (eds.) PKAW 2010. LNCS, vol. 6232, pp. 235–246. Springer, Heidelberg (2010)
Chapter Google Scholar
Fan, R.E., Chen, P.H., Lin, C.J.: Working set selection using second order information for training svm. J. Machine Learning Research 6, 1889–1918 (2005)
MathSciNet MATH Google Scholar
Frank, F., Witten, I.: Generating accurate rule sets without global optimization. In: Proc. 15th Internat. Conf. on Machine Learning, pp. 144–151 (1998)
Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. 13th Internat. Conf. Machine Learning, pp. 148–156 (1996)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)
Article Google Scholar
Hamid, I.R.A., Abawajy, J.: Hybrid Feature Selection for Phishing Email Detection. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 266–275. Springer, Heidelberg (2011)
Chapter Google Scholar
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Advances in Neural Information Processing Systems (1998)
Google Scholar
Islam, R., Abawajy, J.: A multi-tier phishing detection and filtering approach. Journal of Network and Computer Applications (to appear, 2012)
Google Scholar
Islam, R., Abawajy, J., Warren, M.: Multi-tier phishing email classification with an impact of classifier rescheduling. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks, ISPAN 2009, pp. 789–793 (2009)
Google Scholar
Islam, R., Singh, J., Chonka, A., Zhou, W.: Multi-classifier classification of spam email on an ubiquitous multi-core architecture. In: Proceedings – 2008 IFIP International Conference on Network and Parallel Computing, NPC 2008, pp. 210–217 (2008)
Google Scholar
Islam, R., Zhou, W.: Email classification using multi-tier classification algorithms. In: Proc. 7th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2008 (2008)
Google Scholar
Islam, R., Zhou, W., Chowdhury, M.: Email categorization using (2+1)-tier classification algorithms. In: Proceedings – 7th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2008, In Conjunction with 2nd IEEE/ACIS Int. Workshop on e-Activity, IEEE/ACIS IWEA 2008, pp. 276–281 (2008)
Google Scholar
Islam, R., Zhou, W., Gao, M., Xiang, Y.: An innovative analyser for multi-classifier email classification based on grey list analysis. Journal of Network and Computer Applications 32, 357–366 (2009)
Article Google Scholar
Islam, R., Zhou, W., Xiang, Y., Mahmood, A.: Spam filtering for network traffic security on a multi-core environment. Concurrency Computation Practice and Experience 21(10), 1307–1320 (2009)
Article Google Scholar
Keerthi, S., Shevade, S., Bhattacharyya, C., Murthy, K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)
Article Google Scholar
Kelarev, A., Brown, S., Watters, P., Wu, X.W., Dazeley, R.: Establishing reasoning communities of security experts for internet commerce security. In: Technologies for Supporting Reasoning Communities and Collaborative Decision Making: Cooperative Approaches, pp. 380–396. IGI Global (2011)
Google Scholar
Layton, R., Brown, S., Watters, P.: Using differencing to increase distinctiveness for phishing website clustering. In: Cybercrime and Trustworthy Computing Workshop, CTC 2009, Brisbane, Australia (2009)
Google Scholar
Layton, R., Watters, P.: Determining provenance in phishing websites using automated conceptual analysis. In: 4th Annual APWG eCrime Researchers Summit, Tacoma, WA (2009)
Google Scholar
Ma, L., Yearwood, J., Watters, P.: Establishing phishing provenance using orthographic features. In: Proceedings of the APWG eCrime Research Summit, eCRIME 2009, pp. 1–10 (2009)
Google Scholar
Madjarov, G., Gjorgjevikj, D., Delev, T.: Efficient Two Stage Voting Architecture for Pairwise Multi-label Classification. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 164–173. Springer, Heidelberg (2010)
Chapter Google Scholar
Martin, B.: Instance-based learning: Nearest neighbor with generalization, Hamilton, New Zealand (1995)
Google Scholar
Melville, P., Mooney, R.: Creating diversity in ensembles using artificial data. Information Fusion 6, 99–111 (2005)
Article Google Scholar
OECD: Organisation for Economic Cooperation and Development, OECD task force on spam, OECD anti-spam toolkit and its annexes, http://www.oecd.org/dataoecd/63/28/36494147.pdf (accessed November 20, 2011)
Phishing corpus homepage (2006), http://monkey.org/~jose/wiki/doku (accessed July 30, 2012)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning (1998)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Roy, S.: Nearest neighbor with generalization, Christchurch, New Zealand (2002)
Google Scholar
Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading Classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)
Chapter Google Scholar
Spamassassin public corpus (2006), http://spamassassin.apache.org/publiccorpus/ (accessed July 29, 2012)
Ting, K., Witten, I.: Stacking bagged and dagged models. In: Fourteenth international Conference on Machine Learning, pp. 367–375 (1997)
Google Scholar
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)
Article Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier/Morgan Kaufman, Amsterdam (2005)
MATH Google Scholar
Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Article Google Scholar
Yearwood, J., Webb, D., Ma, L., Vamplew, P., Ofoghi, B., Kelarev, A.: Applying clustering and ensemble clustering approaches to phishing profiling. In: Kennedy, P., Ong, K., Christen, P. (eds.) Proc. 8th Australasian Data Mining Conference on Data Mining and Analytics, AusDM 2009. CRPIT, vol. 101, pp. 25–34. ACS, Melbourne (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Deakin University, 221 Burwood Hwy, Burwood, 3125, Australia
Jemal Abawajy & Andrei Kelarev
School of SITE, University of Ballarat, PO Box 663, Ballarat, Victoria, 3353, Australia
Andrei Kelarev

Authors

Jemal Abawajy
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Kelarev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang & Wanlei Zhou &
Computer Science Department, ETSI Informatica, University of Malaga, Campus de Teatinos, 29170, Malaga, Spain
Javier Lopez
Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Ave., 90089-2564, Los Angeles, CA, USA
C.-C. Jay Kuo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abawajy, J., Kelarev, A. (2012). A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering. In: Xiang, Y., Lopez, J., Kuo, CC.J., Zhou, W. (eds) Cyberspace Safety and Security. CSS 2012. Lecture Notes in Computer Science, vol 7672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35362-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-35362-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35361-1
Online ISBN: 978-3-642-35362-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics