Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach

Rao, Routhu Srinivasa; Pais, Alwyn Roshan

doi:10.1007/s12652-019-01637-z

Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach

Original Research
Published: 13 December 2019

Volume 11, pages 3853–3872, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

984 Accesses
28 Citations
Explore all metrics

Abstract

The visual similarity-based techniques detect the phishing sites based on the similarity between the suspicious site and the existing database of resources such as screenshots, styles, logos, favicons etc. These techniques fail to detect phishing sites which target non-whitelisted legitimate domain or when phishing site with manipulated whitelisted legitimate content is encountered. Also, these techniques are not well adaptable at the client-side due to their computation and space complexity. Thus there is a need for light weight visual similarity-based technique detecting phishing sites targeting non-whitelisted legitimate resources. Unlike traditional visual similarity-based techniques using whitelists, in this paper, we employed a light-weight visual similarity based blacklist approach as a first level filter for the detection of near duplicate phishing sites. For the non-blacklisted phishing sites, we have incorporated a heuristic mechanism as a second level filter. We used two fuzzy similarity measures, Simhash and Perceptual hash for calculating the similarity score between the suspicious site and existing blacklisted phishing sites. Each similarity measure generates a unique fingerprint for a given website and also differs with less number of bits with a similar website. All three fingerprints together represent a website which undergoes blacklist filtering for the identification of the target website. The phishing sites which bypassed from the first level filter undergo second level heuristic filtering. We used comprehensive heuristic features including URL and source code based features for the detection of non-blacklisted phishing sites. The experimental results demonstrate that the blacklist filter alone is able to detect 55.58% of phishing sites which are either replicas or near duplicates of existing phishing sites. We also proposed an ensemble model with Random Forest (RF), Extra-Tree and XGBoost to evaluate the contribution of both blacklist and heuristic filters together as an entity and the model achieved a significant accuracy of 98.72% and Matthews Correlation Coefficient (MCC) of 97.39%. The proposed model is deployed as a chrome extension named as BlackPhish to provide real time protection against phishing sites at the client side. We also compared BlackPhish with the existing anti-phishing techniques where it outperformed existing works with a significant difference in accuracy and MCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fighting against phishing attacks: state of the art and future challenges

Article 17 March 2016

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Notes

References

Ardi C, Heidemann J (2016) Auntietuna: personalized content-based phishing detection. In: NDSS usable security workshop (USEC). https://doi.org/10.14722/usec.2016.23012
Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, González FA (2017) Classifying phishing urls using recurrent neural networks. In: Electronic Crime Research (eCrime), 2017 APWG symposium on, IEEE, pp 1–8
Britt J, Wardman B, Sprague A, Warner G (2012) Clustering potential phishing websites using deepmd5. In: LEET
Chiew KL, Chang EH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26. https://doi.org/10.1016/j.cose.2015.07.006
Article Google Scholar
Chiew KL, Choo JSF, Sze SN, Yong KS (2018) Leverage website favicon to detect phishing websites. Secur Commun Netw. https://doi.org/10.1155/2018/7251750
Article Google Scholar
Chou N, Ledesma R, Teraguchi Y, Mitchell JC et al (2004) Client-side defense against web-based identity theft. In: NDSS. http://www.isoc.org/isoc/conferences/ndss/04/proceedings/Papers/Chou.pdf
Cui Q, Jourdan GV, Bochmann GV, Couturier R, Onut IV (2017) Tracking phishing attacks over time. In: Proceedings of the 26th International Conference on World Wide Web, International World Wide Web Conferences steering committee, WWW ’17, pp 667–676. https://doi.org/10.1145/3038912.3052654
Ding Y, Luktarhan N, Li K, Slamu W (2019) A keyword-based combination approach for detecting phishing webpages. Comput Secur 84:256–275. https://doi.org/10.1016/j.cose.2019.03.018
Felegyhazi M, Kreibich C, Paxson V (2010) On the potential of proactive domain blacklisting. LEET 10:6–6
Google Scholar
Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on Recurring malcode, ACM, pp 1–8
Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture for detecting phishing webpages. Comput Secur 40:23–37. https://doi.org/10.1016/j.cose.2013.10.004
Han W, Cao Y, Bertino E, Yong J (2012) Using automated individual white-list to protect web digital identities. Expert Syst Appl 39(15):11,861–11,869
Article Google Scholar
Hara M, Yamada A, Miyake Y (2009) Visual similarity-based phishing detection without victim site information. In: Computational intelligence in cyber security, 2009. CICS’09. IEEE symposium on, IEEE, pp 30–36. https://doi.org/10.1109/CICYBS.2009.4925087
He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, Sutanto A (2011) An efficient phishing webpage detector. Expert Syst Appl 38(10):12018–12027. https://doi.org/10.1016/j.eswa.2011.01.046.
Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using url and html features for phishing webpage detection. Future Gener Comput Syst 94:27–39. https://doi.org/10.1016/j.future.2018.11.004
Mao J, Tian W, Li P, Wei T, Liang Z (2017) Phishing-alarm: robust and efficient phishing detection via page component similarity. IEEE Access 5:17,020–17,030
Article Google Scholar
Marchal S, Saari K, Singh N, Asokan N (2016) Know your phish: Novel techniques for detecting phishing sites and their targets. In: Distributed computing systems (ICDCS), 2016 IEEE 36th International conference on, IEEE, pp 323–333
Marchal S, Armano G, Gröndahl T, Saari K, Singh N, Asokan N (2017) Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans Comput 66(10):1717–1733
Article MathSciNet Google Scholar
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242. https://doi.org/10.1016/j.eswa.2016.01.028
Article Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: International conference for internet technology and secured transactions, 2012. IEEE, pp 492–497
Mohammad RM, Thabtah F, McCluskey L (2015) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24. https://doi.org/10.1016/j.cosrev.2015.04.001
Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: INFOCOM, 2010 Proceedings IEEE, IEEE, pp 1–5, https://doi.org/10.1109/INFCOM.2010.5462216
Ramesh G, Krishnamurthi I, Kumar KSS (2014) An efficacious method for detecting phishing webpages through target domain identification. Decis Support Syst 61:12–22. https://doi.org/10.1016/j.dss.2014.01.002
Ramesh G, Gupta J, Gamya P (2017) Identification of phishing webpages and its target domains by analyzing the feign relationship. J Inf Secur Appl 35:75–84. https://doi.org/10.1016/j.jisa.2017.06.001
Rao RS, Ali ST (2015a) A computer vision technique to detect phishing attacks. In: 2015 5th international conference on communication systems and network technologies (CSNT). IEEE, pp 596–601. https://doi.org/10.1109/CSNT.2015.68
Rao RS, Ali ST (2015b) Phishshield: a desktop application to detect phishing webpages through heuristic approach. Proc Comput Sci 54:147–156. https://doi.org/10.1016/j.procs.2015.06.017
Article Google Scholar
Rao RS, Pais AR (2017) An enhanced blacklist method to detect phishing websites. In: International Conference on Information systems security, Springer, pp 323–333
Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3305-0
Article Google Scholar
Rao RS, Pais AR (2019) Jail-phish: an improved search engine based phishing detection system. Comput Secur 83:246–267. https://doi.org/10.1016/j.cose.2019.02.011
Rao RS, Vaishnavi T, Pais AR (2019a) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01311-4
Article Google Scholar
Rao RS, Vaishnavi T, Pais AR (2019b) Phishdump: a multi-model ensemble based technique for the detection of phishing sites in mobile devices. Pervasive Mob Comput 60:101084. https://doi.org/10.1016/j.pmcj.2019.101084
Rosiello AP, Kirda E, Ferrandi F et al (2007) A layout-similarity-based approach for detecting phishing pages. In: Security and privacy in communications networks and the Workshops, 2007. SecureComm 2007. Third International Conference on, IEEE, pp 454–463
RSA (2013) Rsa fraud report. https://www.emc.com/collateral/fraud-report/rsa-online-fraud-report-012014.pdf. Accessed 15 July 2016
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from urls. Expert Syst Appl 117:345–357. https://doi.org/10.1016/j.eswa.2018.09.029
Shirazi H, Bezawada B, Ray I (2018) "kn0w thy doma1n name": unbiased phishing detection using domain name based features. In: Proceedings of the 23Nd ACM on Symposium on access control models and technologies, ACM, SACMAT ’18, pp 69–75. https://doi.org/10.1145/3205977.3205992
Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM Workshop on cyber-physical system security, ACM, New York, NY, USA, CPSS ’17, pp 33–42. https://doi.org/10.1145/3055186.3055188
Su KW, Wu KP, Lee HM, Wei TE (2013) Suspicious url filtering based on logistic regression with multi-view analysis. In: Information Security (Asia JCIS), 2013 Eighth Asia Joint Conference on, IEEE, pp 77–84
Tan CL, Chiew KL, Wong K, Sze SN (2016) Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27. https://doi.org/10.1016/j.dss.2016.05.005
Urvoy T, Chauveau E, Filoche P, Lavergne T (2008) Tracking web spam with html style similarities. ACM Trans Web 2(1):3:1–3:28. https://doi.org/10.1145/1326561.1326564
Article Google Scholar
Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228. https://doi.org/10.1016/j.cose.2016.08.003
Wang Y, Agrawal R, Choi BY (2008) Light weight anti-phishing with user whitelisting in a web browser. In: Region 5 Conference, 2008 IEEE, IEEE, pp 1–4
Wardman B (2016) Assessing the gap: measure the impact of phishing on an organization. In: Annual ADFSL Conference on Digital Forensics, Security and Law 2. https://commons.erau.edu/adfsl/2016/thursday/2
Wardman B, Warner G (2008) Automating phishing website identification through deep md5 matching. In: eCrime Researchers Summit, 2008, IEEE, pp 1–7
Whittaker C, Ryner B, Nazif M (2010) Large-scale automatic classification of phishing pages. In: NDSS ’10. http://www.isoc.org/isoc/conferences/ndss/10/pdf/08.pdf
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: A feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur (TISSEC) 14(2):21. https://doi.org/10.1145/2019599.2019606
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15,196–15,209. https://doi.org/10.1109/ACCESS.2019.2892066
Article Google Scholar
Yi P, Guan Y, Zou F, Yao Y, Wang W, Zhu T (2018) Web phishing detection using a deep learning framework. Wirel Commun Mob Comput 2018:1–9. https://doi.org/10.1155/2018/4678746
Article Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 639–648. https://doi.org/10.1145/1242572.1242659
Zhao J, Wang N, Ma Q, Cheng Z (2018) Classifying malicious urls using gated recurrent neural networks. In: International Conference on innovative mobile and internet services in ubiquitous computing, Springer, pp 385–394

Download references

Acknowledgements

The authors would like to thank Ministry of Electronics & Information Technology (Meity), Government of India for their support in part of the research.

Author information

Authors and Affiliations

Information Security Research Lab, Department of Computer Science and Engineering, National Institute of Technology, Surathkal, Karnataka, 575025, India
Routhu Srinivasa Rao & Alwyn Roshan Pais

Authors

Routhu Srinivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Alwyn Roshan Pais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Routhu Srinivasa Rao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, R.S., Pais, A.R. Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Human Comput 11, 3853–3872 (2020). https://doi.org/10.1007/s12652-019-01637-z

Download citation

Received: 01 July 2019
Accepted: 06 December 2019
Published: 13 December 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s12652-019-01637-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach

Abstract

Access this article

Similar content being viewed by others

Fighting against phishing attacks: state of the art and future challenges

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach

Abstract

Access this article

Similar content being viewed by others

Fighting against phishing attacks: state of the art and future challenges

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation