CatchPhish: detection of phishing websites by inspecting URLs

Rao, Routhu Srinivasa; Vaishnavi, Tatti; Pais, Alwyn Roshan

doi:10.1007/s12652-019-01311-4

CatchPhish: detection of phishing websites by inspecting URLs

Original Research
Published: 10 May 2019

Volume 11, pages 813–825, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Routhu Srinivasa Rao ORCID: orcid.org/0000-0001-5588-0218¹,
Tatti Vaishnavi² &
Alwyn Roshan Pais¹

2761 Accesses
70 Citations
3 Altmetric
Explore all metrics

Abstract

There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Highly accurate phishing URL detection based on machine learning

Article 08 October 2022

Comparative Evaluation of Techniques for Detection of Phishing URLs

Everything Is in the Name – A URL Based Approach for Phishing Detection

Notes

References

Abutair H, Belghith A, AlAhmadi S (2018) CBR-PDS: a case-based reasoning phishing detection system. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0736-0
Article Google Scholar
APWG (2018) Phishing attack trends reports, 1st quarter 2018. https://docs.apwg.org/reports/apwg_trends_report_q1_2018.pdf. Accessed 20 Sept 2018
Bottazzi G, Casalicchio E, Cingolani D, Marturana F, Piu M (2015) Mp-shield: a framework for phishing detection in mobile devices. In: Computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM), 2015 IEEE international conference on, IEEE, pp 1977–1983
Britt J, Wardman B, Sprague A, Warner G (2012) Clustering potential phishing websites using DeepMD5. In: LEET
Chiew KL, Chang EH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26. https://doi.org/10.1016/j.cose.2015.07.006
Article Google Scholar
Chiew KL, Choo JSF, Sze SN, Yong KS (2018) Leverage website favicon to detect phishing websites. Secur Commun Netw 78:95. https://doi.org/10.1155/2018/7251750
Article Google Scholar
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning based phishing detection system. Inf Sci 484:153–166. https://doi.org/10.1016/j.ins.2019.01.064
Article Google Scholar
Choi H, Zhu BB, Lee H (2011) Detecting malicious web links and identifying their attack types. WebApps 11:11–11
Google Scholar
Chou N, Ledesma R, Teraguchi Y, Boneh D, Mitchell JC (2004) Client-side defense against web-based identity theft. Computer Science Department, Stanford University. http://crypto.stanford.edu/SpoofGuard/webspoof.pdf
Chu W, Zhu BB, Xue F, Guan X, Cai Z (2013) Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. In: Communications (ICC), 2013 IEEE international conference on, IEEE, pp 1990–1994
Dhamija R, Tygar JD, Hearst M (2006) Why phishing works. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 581–590. https://doi.org/10.1145/1124772.1124861
Felegyhazi M, Kreibich C, Paxson V (2010) On the potential of proactive domain blacklisting. LEET 10:6–6
Google Scholar
Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0786-3
Article Google Scholar
Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8
Gastellier-Prevost S, Granadillo GG, Laurent M (2011) Decisive heuristics to differentiate legitimate from phishing sites. In: Network and information systems security (SAR-SSI), 2011 conference on, IEEE, pp 1–9
Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture for detecting phishing webpages. Comput Secur 40:23–37. https://doi.org/10.1016/j.cose.2013.10.004
Article Google Scholar
Han W, Cao Y, Bertino E, Yong J (2012) Using automated individual white-list to protect web digital identities. Expert Syst Appl 39(15):11861–11869
Article Google Scholar
Hara M, Yamada A, Miyake Y (2009) Visual similarity-based phishing detection without victim site information. In: Computational intelligence in cyber security, 2009. CICS’09. IEEE symposium on, IEEE, pp 30–36. https://doi.org/10.1109/CICYBS.2009.4925087
He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, Sutanto A (2011) An efficient phishing webpage detector. Expert Syst Appl 38(10):12018–12027. https://doi.org/10.1016/j.eswa.2011.01.046
Article Google Scholar
Huang H, Qian L, Wang Y (2012) A SVM-based technique to detect phishing URLs. Inf Technol J 11(7):921–925
Article Google Scholar
Jain AK, Gupta BB (2017) Two-level authentication approach to protect from phishing attacks in real time. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0616-z
Article Google Scholar
Jain AK, Gupta BB (2018) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0798-z
Article Google Scholar
KasperskyLab (2017) Kaspersky lab:spam and phishing report 2017. https://securelist.com/spam-and-phishing-in-2017/83833/. Accessed 20 Sept 2018
Lin MS, Chiu CY, Lee YJ, Pao HK (2013) Malicious URL filtering—a big data application. In: Big data, 2013 IEEE international conference on, IEEE, pp 589–596
Marchal S, François J, State R, Engel T (2014) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471
Article Google Scholar
Marchal S, Saari K, Singh N, Asokan N (2016) Know your phish: novel techniques for detecting phishing sites and their targets. In: Distributed computing systems (ICDCS), 2016 IEEE 36th international conference on, IEEE, pp 323–333
Marchal S, Armano G, Gröndahl T, Saari K, Singh N, Asokan N (2017) Off-the-Hook: an efficient and usable client-side phishing prevention application. IEEE Trans Comput 66(10):1717–1733
Article MathSciNet Google Scholar
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242. https://doi.org/10.1016/j.eswa.2016.01.028
Article Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for, IEEE, pp 492–497
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458
Article Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2015) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24
Article MathSciNet Google Scholar
Patil DR, Patil J (2018) Malicious URLs detection using decision tree classifiers and majority voting technique. Cybern Inf Technol 18(1):11–29
Google Scholar
Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: INFOCOM, 2010 proceedings IEEE, IEEE, pp 1–5. https://doi.org/10.1109/INFCOM.2010.5462216
Ramesh G, Krishnamurthi I, Kumar KSS (2014) An efficacious method for detecting phishing webpages through target domain identification. Decis Support Syst 61:12–22. https://doi.org/10.1016/j.dss.2014.01.002
Article Google Scholar
Ranganayakulu D, Chellappan C (2013) Detecting malicious URLs in e-mail-an implementation. AASRI Proced 4:125–131
Article Google Scholar
Rao RS, Pais AR (2017) An enhanced blacklist method to detect phishing websites. In: International conference on information systems security, Springer, pp 323–333
Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3305-0
Article Google Scholar
Rosiello AP, Kirda E, Ferrandi F et al (2007) A layout-similarity-based approach for detecting phishing pages. In: Security and privacy in communications networks and the workshops, 2007. SecureComm 2007. Third international conference on, IEEE, pp 454–463
RSA (2018) RSA-online-fraud-report q1 2018. https://www.rsa.com/en-us/offers/rsa-quarterly-fraud-report-q1-2018. Accessed 20 Sept 2018
Sahingoz OK, Buber E, Demir O, Diri B (2018) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345
Article Google Scholar
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York
MATH Google Scholar
Shirazi H, Haefner K, Ray I (2017) Fresh-phish: a framework for auto-detection of phishing websites. In: Information reuse and integration (IRI), 2017 IEEE international conference on, IEEE, pp 137–143
Shirazi H, Bezawada B, Ray I (2018) Kn0w thy doma1n name: unbiased phishing detection using domain name based features. In: Proceedings of the 23nd ACM on symposium on access control models and technologies, ACM, pp 69–75
Su KW, Wu KP, Lee HM, Wei TE (2013) Suspicious URL filtering based on logistic regression with multi-view analysis. In: Information security (Asia JCIS), 2013 eighth Asia joint conference on, IEEE, pp 77–84
Symantec (2018) Internet security threat report, 2018. https://www.symantec.com/content/dam/symantec/docs/reports/istr-23-2018-en.pdf. Accessed 20 Sept 2018
Tan CL, Chiew KL, Wong K, Sze SN (2016) Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27. https://doi.org/10.1016/j.dss.2016.05.005
Article Google Scholar
Thomas K, Grier C, Ma J, Paxson V, Song D (2011) Design and evaluation of a real-time URL spam filtering service. In: Security and privacy (SP), 2011 IEEE symposium on, IEEE, pp 447–462
Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228. https://doi.org/10.1016/j.cose.2016.08.003
Article Google Scholar
Verma R, Dyer K (2015) On the character of phishing URLs: accurate and robust statistical learning classifiers. In: Proceedings of the 5th ACM conference on data and application security and privacy, ACM, pp 111–122
Wang W, Shirley K (2015) Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:150604111
Wang Y, Agrawal R, Choi BY (2008) Light weight anti-phishing with user whitelisting in a web browser. In: Region 5 conference, 2008 IEEE, IEEE, pp 1–4
Xiang G, Hong JI (2009) A hybrid phish detection approach by identity discovery and keywords retrieval. In: Proceedings of the 18th international conference on world wide web, ACM, pp 571–580
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur (TISSEC) 14(2):21. https://doi.org/10.1145/2019599.2019606
Article Google Scholar
Xu L, Zhan Z, Xu S, Ye K (2013) Cross-layer detection of malicious websites. In: Proceedings of the third ACM conference on data and application security and privacy, ACM, pp 141–152
Zhang D, Yan Z, Jiang H, Kim T (2014) A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf Manag 51(7):845–853. https://doi.org/10.1016/j.im.2014.08.003
Article Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 639–648. https://doi.org/10.1145/1242572.1242659 http://dl.acm.org/citation.cfm?id=1242659
Zuhair H, Selamat A, Salleh M (2016) New hybrid features for phish website prediction. Int J Adv Soft Comput Appl 8(1):745
Google Scholar

Download references

Acknowledgements

The authors would like to thank Ministry of Electronics and Information Technology (Meity), Government of India for their support in part of the research.

Author information

Authors and Affiliations

Information Security Research Lab, Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, Karnataka, 575025, India
Routhu Srinivasa Rao & Alwyn Roshan Pais
Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal, Karnataka, 576104, India
Tatti Vaishnavi

Authors

Routhu Srinivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Tatti Vaishnavi
View author publications
You can also search for this author in PubMed Google Scholar
Alwyn Roshan Pais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Routhu Srinivasa Rao.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, R.S., Vaishnavi, T. & Pais, A.R. CatchPhish: detection of phishing websites by inspecting URLs. J Ambient Intell Human Comput 11, 813–825 (2020). https://doi.org/10.1007/s12652-019-01311-4

Download citation

Received: 20 October 2018
Accepted: 28 April 2019
Published: 10 May 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s12652-019-01311-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CatchPhish: detection of phishing websites by inspecting URLs

Abstract

Access this article

Similar content being viewed by others

Highly accurate phishing URL detection based on machine learning

Comparative Evaluation of Techniques for Detection of Phishing URLs

Everything Is in the Name – A URL Based Approach for Phishing Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CatchPhish: detection of phishing websites by inspecting URLs

Abstract

Access this article

Similar content being viewed by others

Highly accurate phishing URL detection based on machine learning

Comparative Evaluation of Techniques for Detection of Phishing URLs

Everything Is in the Name – A URL Based Approach for Phishing Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation