Abstract
This paper presents a novel approach that can detect phishing attack by analysing the hyperlinks found in the HTML source code of the website. The proposed approach incorporates various new outstanding hyperlink specific features to detect phishing attack. The proposed approach has divided the hyperlink specific features into 12 different categories and used these features to train the machine learning algorithms. We have evaluated the performance of our proposed phishing detection approach on various classification algorithms using the phishing and non-phishing websites dataset. The proposed approach is an entirely client-side solution, and does not require any services from the third party. Moreover, the proposed approach is language independent and it can detect the website written in any textual language. Compared to other methods, the proposed approach has relatively high accuracy in detection of phishing websites as it achieved more than 98.4% accuracy on logistic regression classifier.
Similar content being viewed by others
References
Abu-Nimeh S, Nappa D, Wang X, Nair S (2007). A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, Pittsburgh, pp 60–69
Aburrous M, Hossain MA, Thabatah F, Dahal K (2010) Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Syst Appl 37(12):7913–7921
Alexa top websites (2018) http://www.alexa.com/topsites. Retrieved 22 Aug 2017
APWG H1 2017 Report (2017) http://docs.apwg.org/reports/apwg_trends_report_h1_2017.pdf. Retrieved 25 March 2018
Bhuiyan MZA, Wu J, Wang G, Cao J (2016) Sensing and decision making in cyber-physical systems: the case of structural event monitoring. IEEE Trans Ind Inform 12(6):2103–2114
El-Alfy E-SM (2017) Detection of phishing websites based on probabilistic neural networks and K-Medoids clustering. Comput J. https://doi.org/10.1093/comjnl/bxx035
Fan L, Lei X, Yang N, Duong TQ, Karagiannidis GK (2016) Secure multiple amplify-and forward relaying with cochannel interference. IEEE J Select Topics Signal Process 10(8):1494–1505
Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, Alexandria, pp 1–8
Geng G-G, Yang X-T, Wang W, Meng C-J (2014) A taxonomy of hyperlink hiding techniques. In: Asia-Pacific web conference, vol 8709, Lecture Notes in Computer Science. Springer, Suzhou, pp 165–176
Guava libraries, Google Inc. (2018) https://github.com/google/guava. Retrieved 18 Jan 2018
He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Sutanto A (2011) An efficient phishing webpage detector. Expert Syst Appl 38(10):12018–12027
Jain AK, Gupta BB (2016a) Comparative analysis of features based machine learning approaches for phishing detection. In: Proceedings of 3rd international conference on computing for sustainable global development (INDIACom). IEEE, New Delhi, pp 2125–2130
Jain AK, Gupta BB (2016b) A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J Inf Secur 2016(9)
Jain AK, Gupta BB (2017a) Phishing detection: analysis of visual similarity based approaches. Secur Commun Netw. https://doi.org/10.1155/2017/5421046
Jain AK, Gupta BB (2017b) Two-level authentication approach to protect from phishing attacks in real time. J Ambient Intell Humaniz Comput, 1–14
Jain AK, Gupta BB (2017c). Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst, 1–14. https://doi.org/10.1007/s11235-017-0414-0
Jsoup HTML parser (2018) https://jsoup.org/apidocs/org/jsoup/parser/Parser.html. Retrieved 20 Jan 2018
Kumaraguru P, Rhee Y, Acquisti A, Cranor LF, Hong J, Nunge E (2007) Protecting people from phishing: the design and evaluation of an embedded training email system. In: Proceedings of SIGCHI conference on human factors in computing systems, San Jose
Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine learning based android malware detection. IEEE Trans Ind Inform
Lin Q, Li J, Huang Z, Chen W, Shen J (2018) A short linearly homomorphic proxy signature scheme. IEEE Access
List of online payment service providers (2018) http://research.omicsgroup.org/index.php/List_of_online_payment_service_providers. Retrieved 25 March 2018
Maio CD, Fenza G, Gallo M, Loia V, Parente M (2017) Time-aware adaptive tweets ranking through deep learning. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2017.07.039
Maio CD, Fenza G, Gallo M, Loia V, Parente M (2018) Social media marketing through time-aware collaborative filtering. Concurr Comput Pract Exp 30(1)
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458
Montazera GA, ArabYarmohammadi S (2015) Detection of phishing attacks in Iranian e-banking using a fuzzy–rough hybrid system. Appl Soft Comput 35:482–492
Pan Y, Ding X (2006) Anomaly based web phishing page detection. In: Proceedings of 22nd annual computer security applications conference, Miami Beach, pp 381–392
Phishingpro Report (2016) http://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-Stats-2016.pdf. Retrieved 14 Oct 2017
Phishtank dataset (2018) http://www.phishtank.com. Retrieved 22 Aug 2017
Sheng S, Wardman B, Warner G, Cranor LF, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists. In: Proceedings of the sixth conference on email and anti-spam, Mountain View
Stuffgate Free Online Website Analyzer (2018) http://www.stuffgate.com/. Retrieved 21 Jan 2018
Usage of content languages for websites (2017) https://w3techs.com/technologies/overview/content_language/all. Retrieved 22 Aug 2017
Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228
Wang YG, Zhu G, Shi YQ (2018) Transportation spherical watermarking. IEEE Trans Image Process 27(4):2063–2077
Whittaker C, Ryner B, Nazif M (2010) Large-scale automatic classification of phishing pages. In: Proceedings of the network and distributed system security symposium, San Diego, pp 1–14
Xiang G, Hong J, Rose CP, Cranor L (2011) CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur 14(2)
Zhang Y, Hong JI, Cranor LF (2007) CANTINA: a content-based approach to detecting phishing websites. In: Proceedings of 16th international world wide web conference (WWW2007), Banff, pp 639–648
Zhang W, Jiang Q, Chen L, Li C (2017) Two-stage ELM for phishing Web pages detection using hybrid features. World Wide Web 20(4):797–813
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, A.K., Gupta, B.B. A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Human Comput 10, 2015–2028 (2019). https://doi.org/10.1007/s12652-018-0798-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0798-z