A heuristic technique to detect phishing websites using TWSVM classifier

Rao, Routhu Srinivasa; Pais, Alwyn Roshan; Anand, Pritam

doi:10.1007/s00521-020-05354-z

A heuristic technique to detect phishing websites using TWSVM classifier

Original Article
Published: 24 September 2020

Volume 33, pages 5733–5752, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1051 Accesses
23 Citations
Explore all metrics

Abstract

Phishing websites are on the rise and are hosted on compromised domains such that legitimate behavior is embedded into the designed phishing site to overcome the detection. The traditional heuristic techniques using HTTPS, search engine, Page Ranking and WHOIS information may fail in detecting phishing sites hosted on the compromised domain. Moreover, list-based techniques fail to detect phishing sites when the target website is not in the whitelisted data. In this paper, we propose a novel heuristic technique using TWSVM to detect malicious registered phishing sites and also sites which are hosted on compromised servers, to overcome the aforementioned limitations. Our technique detects the phishing websites hosted on compromised domains by comparing the log-in page and home page of the visiting website. The hyperlink and URL-based features are used to detect phishing sites which are maliciously registered. We have used different versions of support vector machines (SVMs) for the classification of phishing websites. We found that twin support vector machine classifier (TWSVM) outperformed the other versions with a significant accuracy of 98.05% and recall of 98.33%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Sumitra Das Guptta, Khandaker Tayef Shahriar, … Iqbal H. Sarker

Phishing Website Detection Using Machine Learning

CatchPhish: detection of phishing websites by inspecting URLs

Article 10 May 2019

Routhu Srinivasa Rao, Tatti Vaishnavi & Alwyn Roshan Pais

Notes

References

(2005) Stanford CoreNLP-Natural language software. https://stanfordnlp.github.io/CoreNLP/#download
Afroz S, Greenstadt R (2011) Phishzoo: Detecting phishing websites by looking at them. In: Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on, IEEE, pp 368–375
APWG (2016) Phishing attack trends reports, fourth quarter 2016. http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf, Accessed: 2017-03-03
APWG (2017) Phishing attack trends reports, first half 2017. http://docs.apwg.org/reports/apwg_trends_report_h1_2017.pdf, Accessed: 2018-01-01
Ardi C, Heidemann J (2016) Auntietuna: Personalized content-based phishing detection. In: NDSS Usable Security Workshop (USEC)
Britt J, Wardman B, Sprague A, Warner G (2012) Clustering potential phishing websites using deepmd5. In: LEET
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Chen KT, Chen JY, Huang CR, Chen CS (2009) Fighting phishing with discriminative keypoint features. IEEE Internet Comput 13(3)
Chiew KL, Chang EH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26. https://doi.org/10.1016/j.cose.2015.07.006
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Drew J, Moore T (2014) Automatic identification of replicated criminal websites using combined clustering. Security and privacy workshops (SPW). IEEE, IEEE, pp 116–123
Google Scholar
Dunlop M, Groat S, Shelly D (2010) Goldphish: Using images for content-based phishing analysis. In: Internet Monitoring and Protection (ICIMP), 2010 Fifth International Conference on, IEEE, pp 123–128
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 363–370
Fung GM, Mangasarian OL (2005) Multicategory proximal support vector machine classifiers. Mach Learn 59(1–2):77–97
Article Google Scholar
Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture for detecting phishing webpages. Comput Secur 40:23–37. https://doi.org/10.1016/j.cose.2013.10.004
Article Google Scholar
Hara M, Yamada A, Miyake Y (2009) Visual similarity-based phishing detection without victim site information. In: Computational Intelligence in Cyber Security, 2009. CICS’09. IEEE Symposium on, IEEE, pp 30–36, https://doi.org/10.1109/CICYBS.2009.4925087
He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, Sutanto A (2011) An efficient phishing webpage detector. Expert Syst Appl 38(10):12018–12027. https://doi.org/10.1016/j.eswa.2011.01.046
Article Google Scholar
Huh JH, Kim H (2011) Phishing detection with popular search engines: Simple and effective. In: International Symposium on Foundations and Practice of Security, Springer, pp 194–207. https://doi.org/10.1007/978-3-642-27901-0_15
Jain AK, Gupta BB (2017) Two-level authentication approach to protect from phishing attacks in real time. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0616-z
Article Google Scholar
Jang-Jaccard J, Nepal S (2014) A survey of emerging threats in cybersecurity. J Comput Syst Sci 80(5):973–993
Article MathSciNet Google Scholar
Jayadeva KR, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. https://doi.org/10.1109/TPAMI.2007.1068
Article MATH Google Scholar
Jayadeva KR, Chandra S (2017) Twin support vector machines. Springer, Berlin
Book Google Scholar
Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using url and html features for phishing webpage detection. Fut Gen Comput Syst 94:27–39. https://doi.org/10.1016/j.future.2018.11.004
Article Google Scholar
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp 55–60, http://www.aclweb.org/anthology/P/P14/P14-5010
Mao J, Tian W, Li P, Wei T, Liang Z (2017) Phishing-alarm: robust and efficient phishing detection via page component similarity. IEEE Access 5:17020–17030
Article Google Scholar
Marchal S, Saari K, Singh N, Asokan N (2016) Know your phish: novel techniques for detecting phishing sites and their targets. In: Distributed Computing Systems (ICDCS), 2016 IEEE 36th International Conference on, IEEE, pp 323–333
Medvet E, Kirda E, Kruegel C (2008) Visual-similarity-based phishing detection. In: Proceedings of the 4th international conference on Security and privacy in communication netowrks, ACM, p 22
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Char 209:415–446
MATH Google Scholar
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242. https://doi.org/10.1016/j.eswa.2016.01.028
Article Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet Technology And Secured Transactions, 2012 International Conference for, IEEE, pp 492–497
Mohammad RM, Thabtah F, McCluskey L (2015) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24
Article MathSciNet Google Scholar
Moore T, Clayton R (2007) Examining the impact of website take-down on phishing. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, ACM, pp 1–13
Pan Y, Ding X (2006) Anomaly based web phishing page detection. Proc Annu Comput Secur Appl Conf ACSAC 6:381–392. https://doi.org/10.1109/ACSAC.2006.13
Article Google Scholar
Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: INFOCOM, 2010 Proceedings IEEE, IEEE, pp 1–5, https://doi.org/10.1109/INFCOM.2010.5462216
Ramesh G, Krishnamurthi I, Kumar KSS (2014) An efficacious method for detecting phishing webpages through target domain identification. Decis Support Syst 61:12–22. https://doi.org/10.1016/j.dss.2014.01.002
Article Google Scholar
Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications
Rao RS, Ali ST (2015) A computer vision technique to detect phishing attacks. In: Communication Systems and Network Technologies (CSNT), 2015 Fifth International Conference on, IEEE, pp 596–601, https://doi.org/10.1109/CSNT.2015.68
Rao RS, Ali ST (2015) Phishshield: a desktop application to detect phishing webpages through heuristic approach. Proc Comput Sci 54:147–156. https://doi.org/10.1016/j.procs.2015.06.017
Article Google Scholar
Rao RS, Pais AR (2017) An enhanced blacklist method to detect phishing websites. In: International Conference on Information Systems Security, Springer, pp 323–333
Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 1:1. https://doi.org/10.1007/s00521-017-3305-0
Article Google Scholar
Rosiello AP, Kirda E, Ferrandi F, et al (2007) A layout-similarity-based approach for detecting phishing pages. In: Security and Privacy in Communications Networks and the Workshops, 2007. SecureComm 2007. Third International Conference on, IEEE, pp 454–463
Rosiello AP, Kirda E, Ferrandi F, et al (2007) A layout-similarity-based approach for detecting phishing pages. In: Security and Privacy in Communications Networks and the Workshops, 2007. SecureComm 2007. Third International Conference on, IEEE, pp 454–463
RSA (2013) Rsa fraud report. https://www.emc.com/collateral/fraud-report/rsa-online-fraud-report-012014.pdf, Accessed: 2016-07-15
Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Article Google Scholar
Shirazi H, Bezawada B, Ray I (2018) “kn0w thy doma1n name”: Unbiased phishing detection using domain name based features. In: Proceedings of the 23Nd ACM on Symposium on Access Control Models and Technologies, ACM, SACMAT ’18, pp 69–75, https://doi.org/10.1145/3205977.3205992
Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, ACM, New York, NY, USA, CPSS ’17, pp 33–42, https://doi.org/10.1145/3055186.3055188,
Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New York
MATH Google Scholar
Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228. https://doi.org/10.1016/j.cose.2016.08.003
Article Google Scholar
Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061
Xiang G, Hong JI (2009) A hybrid phish detection approach by identity discovery and keywords retrieval. In: Proceedings of the 18th international conference on World wide web, ACM, pp 571–580
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur TISSEC 14(2):21. https://doi.org/10.1145/2019599.2019606
Article Google Scholar
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209. https://doi.org/10.1109/ACCESS.2019.2892066
Article Google Scholar
Zhang H, Liu G, Chow TW, Liu W (2011) Textual and visual content-based anti-phishing: a bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546
Article Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 639–648, https://doi.org/10.1145/1242572.1242659, http://dl.acm.org/citation.cfm?id=1242659

Download references

Acknowledgements

The authors would like to thank Ministry of Electronics and Information Technology (MeitY), Government of India, for their support in part of the research.

Author information

Authors and Affiliations

Department of CSE, GMR Institute of Technology, Rajam, Andhra Pradesh, 532127, India
Routhu Srinivasa Rao
Information Security Research Lab, Department of Computer Science and Engineering, National Institute of Technology, Surathkal, Karnataka, 575025, India
Alwyn Roshan Pais
Faculty of Mathematics and Computer Science, South Asian University, New Delhi, 110021, India
Pritam Anand

Authors

Routhu Srinivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Alwyn Roshan Pais
View author publications
You can also search for this author in PubMed Google Scholar
Pritam Anand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Routhu Srinivasa Rao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, R.S., Pais, A.R. & Anand, P. A heuristic technique to detect phishing websites using TWSVM classifier. Neural Comput & Applic 33, 5733–5752 (2021). https://doi.org/10.1007/s00521-020-05354-z

Download citation

Received: 22 January 2019
Accepted: 08 September 2020
Published: 24 September 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00521-020-05354-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A heuristic technique to detect phishing websites using TWSVM classifier

Abstract

Access this article

Similar content being viewed by others

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Phishing Website Detection Using Machine Learning

CatchPhish: detection of phishing websites by inspecting URLs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A heuristic technique to detect phishing websites using TWSVM classifier

Abstract

Access this article

Similar content being viewed by others

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Phishing Website Detection Using Machine Learning

CatchPhish: detection of phishing websites by inspecting URLs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation