Abstract
The need for cyber security is growing every day as the amount of data available online continues to rise exponentially. The cyber security has become a field of prime importance in the recent years and will continue to be so. Hackers and malpractitioners are growing day by day and are using varied methods and techniques to extract information of prime importance from the users. “Phishing” is one of the most common yet unique security concern. It is unique in the way that instead of targeting the system vulnerabilities, it is a social engineering attack targeting human vulnerabilities. Users give up their personal and sensitive data viz. passwords, card details, bank details etc. by falling to scam emails or websites. The target of this research is to create a tool which will help to detect and differentiate a phishing website from a safe website, thus preventing users into opening risky URLs and keeping their personal data safe. Linear Regression and MultinomialNB are used as the prime methods for the classification apart from other techniques viz. Random Forest, Artificial Neural Network and Support Vector Machine. Most common machine learning algorithms require intensive training of data, causing the process to become slow in order to be executed in real time. The aim of the research is to create a model that can work in real time. The designed pipelined model using Logistic regression, achieved an accuracy of around 98%.
Similar content being viewed by others
Data availability
The dataset generated during and/or analyze during the current study are available from the corresponding author.
References
Alswailem A, Alabdullah B, Alrumayh N, Alsedrani A (2019) Detecting Phishing Websites Using Machine Learning, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), pp. 1–6, https://doi.org/10.1109/CAIS.2019.8769571
Aydin M, Baykal N (2015) Feature extraction and classification phishing websites based on URL, 2015 IEEE Conference on Communications and Network Security (CNS), pp. 769–770, https://doi.org/10.1109/CNS.2015.7346927
Bac TN, Duy PT, Pham VH (2021) PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs. In: 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), IEEE, 2021, pp. 1–4
Blasi M (2009) Techniques for detecting zero-day phishing websites. Master of Science Thesis, Iowa State University, Ames
Breve B, Caruccio L, Cirillo S, Desiato D, Deufemia V, Polese G (2020) Enhancing user awareness during internet browsing, In ITASEC, pp. 71–81
Caruccio L, Desiato D, Polese G (2018) Fake account identification in social networks. In: 2018 IEEE international conference on big data (big data), IEEE, pp. 5078–5085
Davis DB (2021) ISTR 2019: internet of things cyber-attacks grow more diverse. Symantec Enterprise Blogs-Expert Perspectives. https://symantec-enterprise-blogs.security.com/blogs/expert-perspectives/istr-2019-internet-things-cyber-attacks-growmore-diverse. Accessed 26 July 2021
Desiato D (2018) A Methodology for GDPR Compliant Data Processing. In SEBD
Dey N, Samhitha S, Hariprasad M, Anand A, Gadad V (2021) Analysis of machine learning algorithms by developing a phishing email and website detection model. In: IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, pp 1–7. https://doi.org/10.1109/CSITSS54238.2021.9683131
Ibm.com. (2021) [online] Available at: <https://www.ibm.com/downloads/cas/QMXVZX6R>. Accessed 26 July 2021
Jakobsson E, Myers E (2006) Phishing and Counter-Measures: Understanding the Increasing Problem of Electronic Identity Theft. Wiley, pp 2–3
Karnik R, Bhandari GM (2016) Support vector machine based malware and phishing website detection. IJCAT-International J Comput Technol 3(5):295–300
Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: Chen J, Piuri V, Su C, Yung M (eds) Network and system security: 10th international conference, NSS 2016, Taipei, Taiwan, September 28–30, 2016, proceedings. Springer International Publishing, Cham, pp 467–482
Marchal S, Franois J, State R, Engel T (2014) PhishStorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471
Nguyen HH, Nguyen DT (2016) Machine learning based phishing web sites detection. In: Duy VH, Dao TT, Zelinka I, Choi H-S, Chadli M (eds) AETA 2015: recent advances in electrical engineering and related sciences. Springer International Publishing, Cham, pp 123–131
Nguyen LAT, To BL, Nguyen HK, Nguyen MH (2013) Detecting phishing web sites: A heuristic URL-based approach, In: 2013 International Conference on Advanced Technologies for Communications (ATC 2013), pp. 597–602
Rao RS, Ali ST (2015) PhishShield: A Desktop Application to Detect Phishing Webpages through Heuristic Approach. Procedia Comput Sci 54(Supplement C):147–156
Rosenthal M (2021) Phishing statistics (updated 2021) - 50+ important phishing stats - Tessian. [online] Tessian. Available at: <https://www.tessian.com/blog/phishing-statistics-2020/>. Accessed 26 July 2021
Sanglerdsinlapachai N, Rungsawang A (2010) Web phishing detection using classifier ensemble, New York, NY, USA, pp. 210–215
Sonicwall.com. (2021) [online] Available at: <https://www.sonicwall.com/medialibrary/en/white-paper/2019-sonicwall-cyber-threat-report.pdf>. Accessed 26 July 2021
Tang L, Mahmoud QH (2021) A survey of machine learning-based solutions for phishing website detection. Mach Learn Knowl Extr 3(3):672–694
Transparencyreport.google.com. (2021) Google Transparency Report. [online] Available at: <https://transparencyreport.google.com/safe-browsing/overview?unsafe=dataset:1;series:malware,phishing;start:1579219200000;end:1611791999999&lu=unsafe>. Accessed 26 July 2021
URL Feature Extractor (n.d.), https://github.com/lucasayres/url-feature-extractor. Accessed 26 July 2021
Verizon Enterprise Solutions. (2021) 2021 Data Breach Investigations Report (DBIR). [online] Available at: <https://enterprise.verizon.com/resources/reports/2021/2021-data-breach-investigations-report.pdf>. Accessed 26 July 2021
Weedon M, Tsaptsinos D, Denholm-Price J (2017) Random Forest explorations for URL classification. In: 2017 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), pp. 1–4
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing websites. In: Proceedings of the 16th international conference on World Wide Web, WWW’ 07, New York, pp 639–648. https://doi.org/10.1145/1242572.1242659
Zhang Z, He Q, Wang B (2017) A Novel Multi-Layer Heuristic Model for Anti-Phishing, New York, NY, USA, p. 21:1–21:6
Code availability
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jha, A.K., Muthalagu, R. & Pawar, P.M. Intelligent phishing website detection using machine learning. Multimed Tools Appl 82, 29431–29456 (2023). https://doi.org/10.1007/s11042-023-14731-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14731-4