Abstract
This paper is devoted to multi-tier ensemble classifiers for the detection and filtering of phishing emails. We introduce a new construction of ensemble classifiers, based on the well known and productive multi-tier approach. Our experiments evaluate their performance for the detection and filtering of phishing emails. The multi-tier constructions are well known and have been used to design effective classifiers for email classification and other applications previously. We investigate new multi-tier ensemble classifiers, where diverse ensemble methods are combined in a unified system by incorporating different ensembles at a lower tier as an integral part of another ensemble at the top tier. Our novel contribution is to investigate the possibility and effectiveness of combining diverse ensemble methods into one large multi-tier ensemble for the example of detection and filtering of phishing emails. Our study handled a few essential ensemble methods and more recent approaches incorporated into a combined multi-tier ensemble classifier. The results show that new large multi-tier ensemble classifiers achieved better performance compared with the outcomes of the base classifiers and ensemble classifiers incorporated in the multi-tier system. This demonstrates that the new method of combining diverse ensembles into one unified multi-tier ensemble can be applied to increase the performance of classifiers if diverse ensembles are incorporated in the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
APWG: Anti-Phishing Working Group, http://apwg.org/ (accessed June 10, 2012)
Beliakov, G., Yearwood, J., Kelarev, A.: Application of rank correlation, clustering and classification in information security. Journal of Networks 7, 935–955 (2012)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dazeley, R., Yearwood, J.L., Kang, B.H., Kelarev, A.V.: Consensus Clustering and Supervised Classification for Profiling Phishing Emails in Internet Commerce Security. In: Kang, B.-H., Richards, D. (eds.) PKAW 2010. LNCS, vol. 6232, pp. 235–246. Springer, Heidelberg (2010)
Fan, R.E., Chen, P.H., Lin, C.J.: Working set selection using second order information for training svm. J. Machine Learning Research 6, 1889–1918 (2005)
Frank, F., Witten, I.: Generating accurate rule sets without global optimization. In: Proc. 15th Internat. Conf. on Machine Learning, pp. 144–151 (1998)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. 13th Internat. Conf. Machine Learning, pp. 148–156 (1996)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)
Hamid, I.R.A., Abawajy, J.: Hybrid Feature Selection for Phishing Email Detection. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 266–275. Springer, Heidelberg (2011)
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Advances in Neural Information Processing Systems (1998)
Islam, R., Abawajy, J.: A multi-tier phishing detection and filtering approach. Journal of Network and Computer Applications (to appear, 2012)
Islam, R., Abawajy, J., Warren, M.: Multi-tier phishing email classification with an impact of classifier rescheduling. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks, ISPAN 2009, pp. 789–793 (2009)
Islam, R., Singh, J., Chonka, A., Zhou, W.: Multi-classifier classification of spam email on an ubiquitous multi-core architecture. In: Proceedings – 2008 IFIP International Conference on Network and Parallel Computing, NPC 2008, pp. 210–217 (2008)
Islam, R., Zhou, W.: Email classification using multi-tier classification algorithms. In: Proc. 7th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2008 (2008)
Islam, R., Zhou, W., Chowdhury, M.: Email categorization using (2+1)-tier classification algorithms. In: Proceedings – 7th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2008, In Conjunction with 2nd IEEE/ACIS Int. Workshop on e-Activity, IEEE/ACIS IWEA 2008, pp. 276–281 (2008)
Islam, R., Zhou, W., Gao, M., Xiang, Y.: An innovative analyser for multi-classifier email classification based on grey list analysis. Journal of Network and Computer Applications 32, 357–366 (2009)
Islam, R., Zhou, W., Xiang, Y., Mahmood, A.: Spam filtering for network traffic security on a multi-core environment. Concurrency Computation Practice and Experience 21(10), 1307–1320 (2009)
Keerthi, S., Shevade, S., Bhattacharyya, C., Murthy, K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)
Kelarev, A., Brown, S., Watters, P., Wu, X.W., Dazeley, R.: Establishing reasoning communities of security experts for internet commerce security. In: Technologies for Supporting Reasoning Communities and Collaborative Decision Making: Cooperative Approaches, pp. 380–396. IGI Global (2011)
Layton, R., Brown, S., Watters, P.: Using differencing to increase distinctiveness for phishing website clustering. In: Cybercrime and Trustworthy Computing Workshop, CTC 2009, Brisbane, Australia (2009)
Layton, R., Watters, P.: Determining provenance in phishing websites using automated conceptual analysis. In: 4th Annual APWG eCrime Researchers Summit, Tacoma, WA (2009)
Ma, L., Yearwood, J., Watters, P.: Establishing phishing provenance using orthographic features. In: Proceedings of the APWG eCrime Research Summit, eCRIME 2009, pp. 1–10 (2009)
Madjarov, G., Gjorgjevikj, D., Delev, T.: Efficient Two Stage Voting Architecture for Pairwise Multi-label Classification. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 164–173. Springer, Heidelberg (2010)
Martin, B.: Instance-based learning: Nearest neighbor with generalization, Hamilton, New Zealand (1995)
Melville, P., Mooney, R.: Creating diversity in ensembles using artificial data. Information Fusion 6, 99–111 (2005)
OECD: Organisation for Economic Cooperation and Development, OECD task force on spam, OECD anti-spam toolkit and its annexes, http://www.oecd.org/dataoecd/63/28/36494147.pdf (accessed November 20, 2011)
Phishing corpus homepage (2006), http://monkey.org/~jose/wiki/doku (accessed July 30, 2012)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning (1998)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Roy, S.: Nearest neighbor with generalization, Christchurch, New Zealand (2002)
Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading Classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)
Spamassassin public corpus (2006), http://spamassassin.apache.org/publiccorpus/ (accessed July 29, 2012)
Ting, K., Witten, I.: Stacking bagged and dagged models. In: Fourteenth international Conference on Machine Learning, pp. 367–375 (1997)
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier/Morgan Kaufman, Amsterdam (2005)
Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Yearwood, J., Webb, D., Ma, L., Vamplew, P., Ofoghi, B., Kelarev, A.: Applying clustering and ensemble clustering approaches to phishing profiling. In: Kennedy, P., Ong, K., Christen, P. (eds.) Proc. 8th Australasian Data Mining Conference on Data Mining and Analytics, AusDM 2009. CRPIT, vol. 101, pp. 25–34. ACS, Melbourne (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abawajy, J., Kelarev, A. (2012). A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering. In: Xiang, Y., Lopez, J., Kuo, CC.J., Zhou, W. (eds) Cyberspace Safety and Security. CSS 2012. Lecture Notes in Computer Science, vol 7672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35362-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-35362-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35361-1
Online ISBN: 978-3-642-35362-8
eBook Packages: Computer ScienceComputer Science (R0)