An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites

Miyamoto, Daisuke; Hazeyama, Hiroaki; Kadobayashi, Youki

doi:10.1007/978-3-642-02490-0_66

Daisuke Miyamoto¹⁹,
Hiroaki Hazeyama¹⁹ &
Youki Kadobayashi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5506))

Included in the following conference series:

International Conference on Neural Information Processing

1883 Accesses
26 Citations

Abstract

In this paper, we present the performance of machine learning-based methods for detection of phishing sites. We employ 9 machine learning techniques including AdaBoost, Bagging, Support Vector Machines, Classification and Regression Trees, Logistic Regression, Random Forests, Neural Networks, Naive Bayes, and Bayesian Additive Regression Trees. We let these machine learning techniques combine heuristics, and also let machine learning-based detection methods distinguish phishing sites from others. We analyze our dataset, which is composed of 1,500 phishing sites and 1,500 legitimate sites, classify them using the machine learning-based detection methods, and measure the performance. In our evaluation, we used f ₁ measure, error rate, and Area Under the ROC Curve (AUC) as performance metrics along with our requirements for detection methods. The highest f ₁ measure is 0.8581, the lowest error rate is 14.15%, and the highest AUC is 0.9342, all of which are observed in the case of AdaBoost. We also observe that 7 out of 9 machine learning-based detection methods outperform the traditional detection method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anti-Phishing Working Group: Phishing Activity Trends Report (July 2007)
Google Scholar
Zhang, Y., Egelman, S., Cranor, L., Hong, J.: Phinding Phish: Evaluating Anti-Phishing Tools. In: Proceesings of the 14th Annual Network and Distributed System Security Symposium (NDSS 2007) (2007)
Google Scholar
Kumar, A.: Phishing - A new age weapon. Technical report, Open Web Application Secuirtry Project (OWASP) (2005)
Google Scholar
Tally, G., Thomas, R., Vleck, T.V.: Anti-Phishing: Best Practices for Institutions and Consumers. Technical report, McAfee Research (2004)
Google Scholar
Van der Merwe, A., Loock, M., Dabrowski, M.: Characteristics and responsibilities involeved in a phishing attack. In: Proceedings of the 4th International Symposium on Information and Communication Technologies (ISICT 2005) (2005)
Google Scholar
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: A Proposal of the AdaBoost-Based Detection of Phishing Sites. In: Proceedings of the 2nd Joint Workshop on Information security (2007)
Google Scholar
Zhang, Y., Hong, J., Cranor, L.: CANTINA: A Content-Based Approach to Detect Phishing Web Sites. In: Proceesings of the 16th World Wide Web Conference (WWW 2007) (2007)
Google Scholar
Fette, I., Sadeh, N.M., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International Conference on World Wide Web (WWW 2007) (2007)
Google Scholar
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of eCrime Researchers Summit (eCryme 2007) (2007)
Google Scholar
Basnet, R., Mukkamala, S., Sung, A.H.: Detection of phishing attacks: A machine learning approach. Studies in Fuzziness and Soft Computing 226, 373–383 (2008)
Article Google Scholar
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Proceedings of the 22nd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (ACSAC 2006) (2006)
Google Scholar
OpenDNS: PhishTank - Join the fight against phishing, http://www.phishtank.com
Robichaux, P., Ganger, D.L.: Gone Phishing: Evaluating Anti-Phishing Tools for Windows, http://www.3sharp.com/projects/antiphishing/gone-phishing.pdf
Alexa Internet, Inc.: Alexa the Web Information Company, http://www.alexa.com
Yahoo!Inc.: Random Yahoo Link, http://random.yahoo.com/fast/ryl

Download references

Author information

Authors and Affiliations

Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan
Daisuke Miyamoto, Hiroaki Hazeyama & Youki Kadobayashi

Authors

Daisuke Miyamoto
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Hazeyama
View author publications
You can also search for this author in PubMed Google Scholar
Youki Kadobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Network Design and Research Center, 680-4 Fukuoka, 820-8502, Kawazu, Iizuka, Japan
Mario Köppen
Knowledge Engineering and Discovery Research Institute (KEDRI), School of Computing and Mathematical Sciences, Auckland University of Technology, 350 Queen Street, 10110, Auckland, New Zealand
Nikola Kasabov
Department of Electrical and Computer Engineering, Robotics Laboratory, Auckland University of Technology, 38 Princes Street, 1142, Auckland, New Zealand
George Coghill

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyamoto, D., Hazeyama, H., Kadobayashi, Y. (2009). An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_66

Download citation

DOI: https://doi.org/10.1007/978-3-642-02490-0_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02489-4
Online ISBN: 978-3-642-02490-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics