Skip to main content

Advertisement

Log in

Intelligent phishing website detection using machine learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The need for cyber security is growing every day as the amount of data available online continues to rise exponentially. The cyber security has become a field of prime importance in the recent years and will continue to be so. Hackers and malpractitioners are growing day by day and are using varied methods and techniques to extract information of prime importance from the users. “Phishing” is one of the most common yet unique security concern. It is unique in the way that instead of targeting the system vulnerabilities, it is a social engineering attack targeting human vulnerabilities. Users give up their personal and sensitive data viz. passwords, card details, bank details etc. by falling to scam emails or websites. The target of this research is to create a tool which will help to detect and differentiate a phishing website from a safe website, thus preventing users into opening risky URLs and keeping their personal data safe. Linear Regression and MultinomialNB are used as the prime methods for the classification apart from other techniques viz. Random Forest, Artificial Neural Network and Support Vector Machine. Most common machine learning algorithms require intensive training of data, causing the process to become slow in order to be executed in real time. The aim of the research is to create a model that can work in real time. The designed pipelined model using Logistic regression, achieved an accuracy of around 98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The dataset generated during and/or analyze during the current study are available from the corresponding author.

References

  1. Alswailem A, Alabdullah B, Alrumayh N, Alsedrani A (2019) Detecting Phishing Websites Using Machine Learning, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), pp. 1–6, https://doi.org/10.1109/CAIS.2019.8769571

  2. Aydin M, Baykal N (2015) Feature extraction and classification phishing websites based on URL, 2015 IEEE Conference on Communications and Network Security (CNS), pp. 769–770, https://doi.org/10.1109/CNS.2015.7346927

  3. Bac TN, Duy PT, Pham VH (2021) PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs. In: 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), IEEE, 2021, pp. 1–4

  4. Blasi M (2009) Techniques for detecting zero-day phishing websites. Master of Science Thesis, Iowa State University, Ames

  5. Breve B, Caruccio L, Cirillo S, Desiato D, Deufemia V, Polese G (2020) Enhancing user awareness during internet browsing, In ITASEC, pp. 71–81

  6. Caruccio L, Desiato D, Polese G (2018) Fake account identification in social networks. In: 2018 IEEE international conference on big data (big data), IEEE, pp. 5078–5085

  7. Davis DB (2021) ISTR 2019: internet of things cyber-attacks grow more diverse. Symantec Enterprise Blogs-Expert Perspectives. https://symantec-enterprise-blogs.security.com/blogs/expert-perspectives/istr-2019-internet-things-cyber-attacks-growmore-diverse. Accessed 26 July 2021

  8. Desiato D (2018) A Methodology for GDPR Compliant Data Processing. In SEBD

  9. Dey N, Samhitha S, Hariprasad M, Anand A, Gadad V (2021) Analysis of machine learning algorithms by developing a phishing email and website detection model. In: IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, pp 1–7. https://doi.org/10.1109/CSITSS54238.2021.9683131

  10. Ibm.com. (2021) [online] Available at: <https://www.ibm.com/downloads/cas/QMXVZX6R>. Accessed 26 July 2021

  11. Jakobsson E, Myers E (2006) Phishing and Counter-Measures: Understanding the Increasing Problem of Electronic Identity Theft. Wiley, pp 2–3

    Book  Google Scholar 

  12. Karnik R, Bhandari GM (2016) Support vector machine based malware and phishing website detection. IJCAT-International J Comput Technol 3(5):295–300

    Google Scholar 

  13. Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: Chen J, Piuri V, Su C, Yung M (eds) Network and system security: 10th international conference, NSS 2016, Taipei, Taiwan, September 28–30, 2016, proceedings. Springer International Publishing, Cham, pp 467–482

    Chapter  Google Scholar 

  14. Marchal S, Franois J, State R, Engel T (2014) PhishStorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471

    Article  Google Scholar 

  15. Nguyen HH, Nguyen DT (2016) Machine learning based phishing web sites detection. In: Duy VH, Dao TT, Zelinka I, Choi H-S, Chadli M (eds) AETA 2015: recent advances in electrical engineering and related sciences. Springer International Publishing, Cham, pp 123–131

    Chapter  Google Scholar 

  16. Nguyen LAT, To BL, Nguyen HK, Nguyen MH (2013) Detecting phishing web sites: A heuristic URL-based approach, In: 2013 International Conference on Advanced Technologies for Communications (ATC 2013), pp. 597–602

  17. Rao RS, Ali ST (2015) PhishShield: A Desktop Application to Detect Phishing Webpages through Heuristic Approach. Procedia Comput Sci 54(Supplement C):147–156

    Article  Google Scholar 

  18. Rosenthal M (2021) Phishing statistics (updated 2021) - 50+ important phishing stats - Tessian. [online] Tessian. Available at: <https://www.tessian.com/blog/phishing-statistics-2020/>. Accessed 26 July 2021

  19. Sanglerdsinlapachai N, Rungsawang A (2010) Web phishing detection using classifier ensemble, New York, NY, USA, pp. 210–215

  20. Sonicwall.com. (2021) [online] Available at: <https://www.sonicwall.com/medialibrary/en/white-paper/2019-sonicwall-cyber-threat-report.pdf>. Accessed 26 July 2021

  21. Tang L, Mahmoud QH (2021) A survey of machine learning-based solutions for phishing website detection. Mach Learn Knowl Extr 3(3):672–694

    Article  Google Scholar 

  22. Transparencyreport.google.com. (2021) Google Transparency Report. [online] Available at: <https://transparencyreport.google.com/safe-browsing/overview?unsafe=dataset:1;series:malware,phishing;start:1579219200000;end:1611791999999&lu=unsafe>. Accessed 26 July 2021

  23. URL Feature Extractor (n.d.), https://github.com/lucasayres/url-feature-extractor. Accessed 26 July 2021

  24. Verizon Enterprise Solutions. (2021) 2021 Data Breach Investigations Report (DBIR). [online] Available at: <https://enterprise.verizon.com/resources/reports/2021/2021-data-breach-investigations-report.pdf>. Accessed 26 July 2021

  25. Weedon M, Tsaptsinos D, Denholm-Price J (2017) Random Forest explorations for URL classification. In: 2017 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), pp. 1–4

  26. Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209

    Article  Google Scholar 

  27. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing websites. In: Proceedings of the 16th international conference on World Wide Web, WWW’ 07, New York, pp 639–648. https://doi.org/10.1145/1242572.1242659

  28. Zhang Z, He Q, Wang B (2017) A Novel Multi-Layer Heuristic Model for Anti-Phishing, New York, NY, USA, p. 21:1–21:6

Download references

Code availability

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pranav M. Pawar.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jha, A.K., Muthalagu, R. & Pawar, P.M. Intelligent phishing website detection using machine learning. Multimed Tools Appl 82, 29431–29456 (2023). https://doi.org/10.1007/s11042-023-14731-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14731-4

Keywords

Navigation