Skip to main content

A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering

  • Conference paper
Cyberspace Safety and Security (CSS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7672))

Included in the following conference series:

Abstract

This paper is devoted to multi-tier ensemble classifiers for the detection and filtering of phishing emails. We introduce a new construction of ensemble classifiers, based on the well known and productive multi-tier approach. Our experiments evaluate their performance for the detection and filtering of phishing emails. The multi-tier constructions are well known and have been used to design effective classifiers for email classification and other applications previously. We investigate new multi-tier ensemble classifiers, where diverse ensemble methods are combined in a unified system by incorporating different ensembles at a lower tier as an integral part of another ensemble at the top tier. Our novel contribution is to investigate the possibility and effectiveness of combining diverse ensemble methods into one large multi-tier ensemble for the example of detection and filtering of phishing emails. Our study handled a few essential ensemble methods and more recent approaches incorporated into a combined multi-tier ensemble classifier. The results show that new large multi-tier ensemble classifiers achieved better performance compared with the outcomes of the base classifiers and ensemble classifiers incorporated in the multi-tier system. This demonstrates that the new method of combining diverse ensembles into one unified multi-tier ensemble can be applied to increase the performance of classifiers if diverse ensembles are incorporated in the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. APWG: Anti-Phishing Working Group, http://apwg.org/ (accessed June 10, 2012)

  2. Beliakov, G., Yearwood, J., Kelarev, A.: Application of rank correlation, clustering and classification in information security. Journal of Networks 7, 935–955 (2012)

    Article  Google Scholar 

  3. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

  5. Dazeley, R., Yearwood, J.L., Kang, B.H., Kelarev, A.V.: Consensus Clustering and Supervised Classification for Profiling Phishing Emails in Internet Commerce Security. In: Kang, B.-H., Richards, D. (eds.) PKAW 2010. LNCS, vol. 6232, pp. 235–246. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Fan, R.E., Chen, P.H., Lin, C.J.: Working set selection using second order information for training svm. J. Machine Learning Research 6, 1889–1918 (2005)

    MathSciNet  MATH  Google Scholar 

  7. Frank, F., Witten, I.: Generating accurate rule sets without global optimization. In: Proc. 15th Internat. Conf. on Machine Learning, pp. 144–151 (1998)

    Google Scholar 

  8. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. 13th Internat. Conf. Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)

    Article  Google Scholar 

  10. Hamid, I.R.A., Abawajy, J.: Hybrid Feature Selection for Phishing Email Detection. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 266–275. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Advances in Neural Information Processing Systems (1998)

    Google Scholar 

  12. Islam, R., Abawajy, J.: A multi-tier phishing detection and filtering approach. Journal of Network and Computer Applications (to appear, 2012)

    Google Scholar 

  13. Islam, R., Abawajy, J., Warren, M.: Multi-tier phishing email classification with an impact of classifier rescheduling. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks, ISPAN 2009, pp. 789–793 (2009)

    Google Scholar 

  14. Islam, R., Singh, J., Chonka, A., Zhou, W.: Multi-classifier classification of spam email on an ubiquitous multi-core architecture. In: Proceedings – 2008 IFIP International Conference on Network and Parallel Computing, NPC 2008, pp. 210–217 (2008)

    Google Scholar 

  15. Islam, R., Zhou, W.: Email classification using multi-tier classification algorithms. In: Proc. 7th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2008 (2008)

    Google Scholar 

  16. Islam, R., Zhou, W., Chowdhury, M.: Email categorization using (2+1)-tier classification algorithms. In: Proceedings – 7th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2008, In Conjunction with 2nd IEEE/ACIS Int. Workshop on e-Activity, IEEE/ACIS IWEA 2008, pp. 276–281 (2008)

    Google Scholar 

  17. Islam, R., Zhou, W., Gao, M., Xiang, Y.: An innovative analyser for multi-classifier email classification based on grey list analysis. Journal of Network and Computer Applications 32, 357–366 (2009)

    Article  Google Scholar 

  18. Islam, R., Zhou, W., Xiang, Y., Mahmood, A.: Spam filtering for network traffic security on a multi-core environment. Concurrency Computation Practice and Experience 21(10), 1307–1320 (2009)

    Article  Google Scholar 

  19. Keerthi, S., Shevade, S., Bhattacharyya, C., Murthy, K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)

    Article  Google Scholar 

  20. Kelarev, A., Brown, S., Watters, P., Wu, X.W., Dazeley, R.: Establishing reasoning communities of security experts for internet commerce security. In: Technologies for Supporting Reasoning Communities and Collaborative Decision Making: Cooperative Approaches, pp. 380–396. IGI Global (2011)

    Google Scholar 

  21. Layton, R., Brown, S., Watters, P.: Using differencing to increase distinctiveness for phishing website clustering. In: Cybercrime and Trustworthy Computing Workshop, CTC 2009, Brisbane, Australia (2009)

    Google Scholar 

  22. Layton, R., Watters, P.: Determining provenance in phishing websites using automated conceptual analysis. In: 4th Annual APWG eCrime Researchers Summit, Tacoma, WA (2009)

    Google Scholar 

  23. Ma, L., Yearwood, J., Watters, P.: Establishing phishing provenance using orthographic features. In: Proceedings of the APWG eCrime Research Summit, eCRIME 2009, pp. 1–10 (2009)

    Google Scholar 

  24. Madjarov, G., Gjorgjevikj, D., Delev, T.: Efficient Two Stage Voting Architecture for Pairwise Multi-label Classification. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 164–173. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Martin, B.: Instance-based learning: Nearest neighbor with generalization, Hamilton, New Zealand (1995)

    Google Scholar 

  26. Melville, P., Mooney, R.: Creating diversity in ensembles using artificial data. Information Fusion 6, 99–111 (2005)

    Article  Google Scholar 

  27. OECD: Organisation for Economic Cooperation and Development, OECD task force on spam, OECD anti-spam toolkit and its annexes, http://www.oecd.org/dataoecd/63/28/36494147.pdf (accessed November 20, 2011)

  28. Phishing corpus homepage (2006), http://monkey.org/~jose/wiki/doku (accessed July 30, 2012)

  29. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning (1998)

    Google Scholar 

  30. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  31. Roy, S.: Nearest neighbor with generalization, Christchurch, New Zealand (2002)

    Google Scholar 

  32. Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading Classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  33. Spamassassin public corpus (2006), http://spamassassin.apache.org/publiccorpus/ (accessed July 29, 2012)

  34. Ting, K., Witten, I.: Stacking bagged and dagged models. In: Fourteenth international Conference on Machine Learning, pp. 367–375 (1997)

    Google Scholar 

  35. Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)

    Article  Google Scholar 

  36. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier/Morgan Kaufman, Amsterdam (2005)

    MATH  Google Scholar 

  37. Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)

    Article  Google Scholar 

  38. Yearwood, J., Webb, D., Ma, L., Vamplew, P., Ofoghi, B., Kelarev, A.: Applying clustering and ensemble clustering approaches to phishing profiling. In: Kennedy, P., Ong, K., Christen, P. (eds.) Proc. 8th Australasian Data Mining Conference on Data Mining and Analytics, AusDM 2009. CRPIT, vol. 101, pp. 25–34. ACS, Melbourne (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abawajy, J., Kelarev, A. (2012). A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering. In: Xiang, Y., Lopez, J., Kuo, CC.J., Zhou, W. (eds) Cyberspace Safety and Security. CSS 2012. Lecture Notes in Computer Science, vol 7672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35362-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35362-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35361-1

  • Online ISBN: 978-3-642-35362-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics