Intelligent phishing website detection using machine learning

Jha, Ashish Kumar; Muthalagu, Raja; Pawar, Pranav M.

doi:10.1007/s11042-023-14731-4

Intelligent phishing website detection using machine learning

Published: 24 February 2023

Volume 82, pages 29431–29456, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

684 Accesses
1 Citation
7 Altmetric
1 Mention
Explore all metrics

Abstract

The need for cyber security is growing every day as the amount of data available online continues to rise exponentially. The cyber security has become a field of prime importance in the recent years and will continue to be so. Hackers and malpractitioners are growing day by day and are using varied methods and techniques to extract information of prime importance from the users. “Phishing” is one of the most common yet unique security concern. It is unique in the way that instead of targeting the system vulnerabilities, it is a social engineering attack targeting human vulnerabilities. Users give up their personal and sensitive data viz. passwords, card details, bank details etc. by falling to scam emails or websites. The target of this research is to create a tool which will help to detect and differentiate a phishing website from a safe website, thus preventing users into opening risky URLs and keeping their personal data safe. Linear Regression and MultinomialNB are used as the prime methods for the classification apart from other techniques viz. Random Forest, Artificial Neural Network and Support Vector Machine. Most common machine learning algorithms require intensive training of data, causing the process to become slow in order to be executed in real time. The aim of the research is to create a model that can work in real time. The designed pipelined model using Logistic regression, achieved an accuracy of around 98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cybersecurity data science: an overview from machine learning perspective

Article Open access 01 July 2020

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Data availability

The dataset generated during and/or analyze during the current study are available from the corresponding author.

References

Alswailem A, Alabdullah B, Alrumayh N, Alsedrani A (2019) Detecting Phishing Websites Using Machine Learning, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), pp. 1–6, https://doi.org/10.1109/CAIS.2019.8769571
Aydin M, Baykal N (2015) Feature extraction and classification phishing websites based on URL, 2015 IEEE Conference on Communications and Network Security (CNS), pp. 769–770, https://doi.org/10.1109/CNS.2015.7346927
Bac TN, Duy PT, Pham VH (2021) PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs. In: 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), IEEE, 2021, pp. 1–4
Blasi M (2009) Techniques for detecting zero-day phishing websites. Master of Science Thesis, Iowa State University, Ames
Breve B, Caruccio L, Cirillo S, Desiato D, Deufemia V, Polese G (2020) Enhancing user awareness during internet browsing, In ITASEC, pp. 71–81
Caruccio L, Desiato D, Polese G (2018) Fake account identification in social networks. In: 2018 IEEE international conference on big data (big data), IEEE, pp. 5078–5085
Davis DB (2021) ISTR 2019: internet of things cyber-attacks grow more diverse. Symantec Enterprise Blogs-Expert Perspectives. https://symantec-enterprise-blogs.security.com/blogs/expert-perspectives/istr-2019-internet-things-cyber-attacks-growmore-diverse. Accessed 26 July 2021
Desiato D (2018) A Methodology for GDPR Compliant Data Processing. In SEBD
Dey N, Samhitha S, Hariprasad M, Anand A, Gadad V (2021) Analysis of machine learning algorithms by developing a phishing email and website detection model. In: IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, pp 1–7. https://doi.org/10.1109/CSITSS54238.2021.9683131
Ibm.com. (2021) [online] Available at: <https://www.ibm.com/downloads/cas/QMXVZX6R>. Accessed 26 July 2021
Jakobsson E, Myers E (2006) Phishing and Counter-Measures: Understanding the Increasing Problem of Electronic Identity Theft. Wiley, pp 2–3
Book Google Scholar
Karnik R, Bhandari GM (2016) Support vector machine based malware and phishing website detection. IJCAT-International J Comput Technol 3(5):295–300
Google Scholar
Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: Chen J, Piuri V, Su C, Yung M (eds) Network and system security: 10th international conference, NSS 2016, Taipei, Taiwan, September 28–30, 2016, proceedings. Springer International Publishing, Cham, pp 467–482
Chapter Google Scholar
Marchal S, Franois J, State R, Engel T (2014) PhishStorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471
Article Google Scholar
Nguyen HH, Nguyen DT (2016) Machine learning based phishing web sites detection. In: Duy VH, Dao TT, Zelinka I, Choi H-S, Chadli M (eds) AETA 2015: recent advances in electrical engineering and related sciences. Springer International Publishing, Cham, pp 123–131
Chapter Google Scholar
Nguyen LAT, To BL, Nguyen HK, Nguyen MH (2013) Detecting phishing web sites: A heuristic URL-based approach, In: 2013 International Conference on Advanced Technologies for Communications (ATC 2013), pp. 597–602
Rao RS, Ali ST (2015) PhishShield: A Desktop Application to Detect Phishing Webpages through Heuristic Approach. Procedia Comput Sci 54(Supplement C):147–156
Article Google Scholar
Rosenthal M (2021) Phishing statistics (updated 2021) - 50+ important phishing stats - Tessian. [online] Tessian. Available at: <https://www.tessian.com/blog/phishing-statistics-2020/>. Accessed 26 July 2021
Sanglerdsinlapachai N, Rungsawang A (2010) Web phishing detection using classifier ensemble, New York, NY, USA, pp. 210–215
Sonicwall.com. (2021) [online] Available at: <https://www.sonicwall.com/medialibrary/en/white-paper/2019-sonicwall-cyber-threat-report.pdf>. Accessed 26 July 2021
Tang L, Mahmoud QH (2021) A survey of machine learning-based solutions for phishing website detection. Mach Learn Knowl Extr 3(3):672–694
Article Google Scholar
Transparencyreport.google.com. (2021) Google Transparency Report. [online] Available at: <https://transparencyreport.google.com/safe-browsing/overview?unsafe=dataset:1;series:malware,phishing;start:1579219200000;end:1611791999999&lu=unsafe>. Accessed 26 July 2021
URL Feature Extractor (n.d.), https://github.com/lucasayres/url-feature-extractor. Accessed 26 July 2021
Verizon Enterprise Solutions. (2021) 2021 Data Breach Investigations Report (DBIR). [online] Available at: <https://enterprise.verizon.com/resources/reports/2021/2021-data-breach-investigations-report.pdf>. Accessed 26 July 2021
Weedon M, Tsaptsinos D, Denholm-Price J (2017) Random Forest explorations for URL classification. In: 2017 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), pp. 1–4
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209
Article Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing websites. In: Proceedings of the 16th international conference on World Wide Web, WWW’ 07, New York, pp 639–648. https://doi.org/10.1145/1242572.1242659
Zhang Z, He Q, Wang B (2017) A Novel Multi-Layer Heuristic Model for Anti-Phishing, New York, NY, USA, p. 21:1–21:6

Download references

Code availability

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai, UAE
Ashish Kumar Jha, Raja Muthalagu & Pranav M. Pawar

Authors

Ashish Kumar Jha
View author publications
You can also search for this author in PubMed Google Scholar
Raja Muthalagu
View author publications
You can also search for this author in PubMed Google Scholar
Pranav M. Pawar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pranav M. Pawar.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jha, A.K., Muthalagu, R. & Pawar, P.M. Intelligent phishing website detection using machine learning. Multimed Tools Appl 82, 29431–29456 (2023). https://doi.org/10.1007/s11042-023-14731-4

Download citation

Received: 03 January 2022
Revised: 24 March 2022
Accepted: 04 February 2023
Published: 24 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11042-023-14731-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent phishing website detection using machine learning

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

A comprehensive survey of AI-enabled phishing attacks detection techniques

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Data availability

References

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intelligent phishing website detection using machine learning

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

A comprehensive survey of AI-enabled phishing attacks detection techniques

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Data availability

References

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation