Abstract
Nowadays, the internet is growing so rapidly in a lightning way that changes our daily behaviors, from online shopping, online learning to online banking and more activities that make our lives easier. However, using such of ways imposed sharing personal informations such as email, password, credit card information etc. Cybercriminals try to find their victims in the cyberspace by tricking the user using the anonymous structure of the internet. Cybercriminals set out new techniques such as phishing, to deceive victims with the use of false websites, in order to collect their sensitive informations. Understanding whether a web page is legitimate or phishing is a very challenging problem that requires our attention. In this work, we propose a new model that classify whether a web page is legitimate or phishing, based on URLs natural language processing and by applying the n-gram model. We analyze the model with different machine learning algorithms and our system achieves an accuracy of 96.41\(\%\) with 97\(\%\) precision.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using n-gram machine learning approach. Expert Syst. Appl. 57(15), 117–126 (2016)
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Abutair, H., Belghith, A., AlAhmadi, S.: CBR-PDS: a case-based reasoning phishing detection system. J. Ambient Intell. Hum. Comput. 10(7), 2593–2606 (2018). https://doi.org/10.1007/s12652-018-0736-0
Ebbu2017 Phishing Dataset. Accessed 1 Apr 2020, https://github.com/ebubekirbbr/pdd/tree/master/input
Volkamer, M., Renaud, K., Reinheimer, B., Kunz, A.: User experiences of torpedo: tooltip-powered phishing email detection. Comput. Secur. 71, 100–113 (2017)
Peng, T., Harris, I., Sawa, Y.: Detecting phishing attacks using natural language processing and machine learning. In: IEEE 12th International Conference on Semantic Computing (ICSC), pp. 300–301 (2018)
Tan, C.L., et al.: PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Supp. Syst. 88, 18–27 (2016)
Chiew, K.L., Choo, J.S.F., Sze, S.N., Yong, K.S.: Leverage website favicon to detect phishing websites. Secur. Commun. Netw. 78, 95 (2018). https://doi.org/10.1155/2018/7251750
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning based phishing detection system. Inf. Sci. 484, 153–166 (2019). https://doi.org/10.1016/j.ins.2019.01.064
Jain, A.K., Gupta, B.B.: Two-level authentication approach to protect from phishing attacks in real time. J. Ambient Intell. Hum. Computi. 9(6), 1783–1796 (2017). https://doi.org/10.1007/s12652-017-0616-z
Jain, A.K., Gupta, B.B.: A machine learning based approach for phishing detection using hyperlinks information. J. Ambient Intell. Hum. Computi. 10(5), 2015–2028 (2018). https://doi.org/10.1007/s12652-018-0798-z
Marchal, S., Saari, K., Singh, N., Asokan, N.: Know your phish: novel techniques for detecting phishing sites and their targets. In: 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), pp. 323–333. IEEE (2016)
Marchal, S., Armano, G., Gröndahl, T., Saari, K., Singh, N., Asokan, N.: Off-the-Hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66(10), 1717–1733 (2017)
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016). https://doi.org/10.1016/j.eswa.2016.01.028
Shirazi, H., Bezawada, B., Ray, I.: Know thy doma1n name: unbiased phishing detection using domain name based features. In: Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies, pp. 69–75. ACM (2018)
Tan, C.L., Chiew, K.L., Wong, K., Sze, S.N.: Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016). https://doi.org/10.1016/j.dss.2016.05.005
Varshney, G., Misra, M., Atrey, P.K.: A phish detector using lightweight search features. Comput. Secur. 62, 213–228 (2016). https://doi.org/10.1016/j.cose.2016.08.003
Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM Workshop on Rapid Malcode (WORM), Alexandria, VA (2007)
Wang, W., Shirley, K.: Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:1506.04111 (2015)
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.Q.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Hum. Comput. (5), 1–15 (2018). https://doi.org/10.1007/s12652-018-0786-3
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2018). https://doi.org/10.1007/s00521-017-3305-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Elkouay, A., Moussa, N., Madani, A. (2022). Classification of URLs Using N-gram Machine Learning Approach. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-07969-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-07969-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07968-9
Online ISBN: 978-3-031-07969-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)