Abstract
The world wide web, beyond its benefits, has also become a major platform for online criminal activities. Traditional protection methods against malicious URLs, such as blacklisting, remain a valid alternative, but cannot detect unknown sites, hence new methods are being developed for automatic detection, using machine learning approaches. This paper strengthens the existing state of the art by proposing an alternative machine learning approach, that uses a set of 14 lexical and host-based features but focuses on the typical mechanisms employed by malicious URLs. The proposed method employs random forest and decision tree as core mechanisms and is evaluated on a combined benign and malicious URL dataset, which indicates an accuracy of over 97%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
We Are Social, Hoootsuite: Digital 2021 Global Overview Report. Datareportal.com. 299 (2021)
Google: Google: Transparency Report. Google Transpar. Rep. (2010)
Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proc. IEEE INFOCOM. (2010). https://doi.org/10.1109/INFCOM.2010.5462216
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. LEET 2010 - 3rd USENIX Work. Large-Scale Exploit. Emergent Threat. Botnets, Spyware, Worms, More. (2010)
Sinha, S., Bailey, M., Jahanian, F.: Shades of Grey: on the effectiveness of reputation-based blacklists. In: 3rd International Conference Malicious Unwanted Software, MALWARE 2008. 57–64 (2008). https://doi.org/10.1109/MALWARE.2008.4690858
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey. (2017)
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014). https://doi.org/10.1016/j.eswa.2014.03.019
Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-centric Comp. Inf. Sci. 6, (2016). https://doi.org/10.1186/s13673-016-0064-3
Aung, E.S., Yamana, H.: URL-based phishing detection using the entropy of non- A lphanumeric characters. ACM Int. Conf. Proceeding Ser. (2019). https://doi.org/10.1145/3366030.3366064
Ravi, R., Shillare, A.A., Bhoir, P.P., Charumathi, K.S.: URL based email phishing detection application. Int. Res. J. Eng. Technol. 8, 335–360 (2021)
Verizon: Data Breach Investigations Report (DBIR). Comput. Fraud Secur. 12, 8 (2019)
Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. J. 48, 729–734 (2016). https://doi.org/10.1016/j.asoc.2016.08.005
Aung, E.S., Zan, T., Yamana, H.: A survey of URL-based phishing detection. pp. 1–8 (2019)
Kumi, S., Lim, C., Lee, S.G.: Malicious url detection based on associative classification. Entropy 23, 1–12 (2021). https://doi.org/10.3390/e23020182
Shantanu, D., Janet, B., Kumar, R.J.A.: Malicious URL detection: a comparative study. In: Proceedings of International Conference Artificial Intelligence Smart System ICAIS 2021, pp. 1147–1151 (2021). https://doi.org/10.1109/ICAIS50930.2021.9396014
Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference (TrustCom/BigDataSE), pp. 737–743 (2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00107
Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: DURLD: malicious URL detection using deep learning-based character level representations. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_21
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.W.: Tweet2Vec: character-based distributed representations for social media. In: 54th Annual Meeting Association Computer Linguistics ACL 2016, pp. 269–274 (2016). https://doi.org/10.18653/v1/p16-2044
Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: AISec 2016 – Proceedings of 2016 ACM Work. Artificial Intelligence Security co-located with CCS 2016, pp. 13–21 (2016). https://doi.org/10.1145/2996758.2996767
Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Galinina, O., Andreev, S., Balandin, S., Koucheryavy, Y. (eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. LNCS, vol. 12525. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65726-0_21
Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020). https://doi.org/10.22667/JOWUA.2020.12.31.031
Li, T., Kou, G., Peng, Y.: Improving malicious URLs detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, (2020). https://doi.org/10.1016/j.is.2020.101494
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious URL detection using supervised machine learning techniques. ACM Int. Conf. Proceeding Ser. (2020). https://doi.org/10.1145/3433174.3433592
Urcuqui, C.: Malicious and Benign Websites dataset. https://www.kaggle.com/xwolf12/malicious-and-benign-websites. Accessed 12 Jul 2021
Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. WebApps. 11 (2011)
Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using C4.5 decision tree. Southeast Eur. J. Soft Comput. 5 (2016). https://doi.org/10.21533/scjournal.v5i1.109
Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: EINSPECT: evolution-guided analysis and detection of malicious web pages. In: Proceedings of International Computing Software Applied Conference, pp. 375–380 (2013). https://doi.org/10.1109/COMPSAC.2013.63
Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. IEEE Int. Conf. Commun. 1990–1994 (2013). https://doi.org/10.1109/ICC.2013.6654816
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: A fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference World Wide Web, WWW 2011. pp. 197–206 (2011). https://doi.org/10.1145/1963405.1963436
Murthy, S. K.: Automatic construction of decision trees from data: a multidisciplinary survey. Data Min. Knowl. Discov. 2(4), 345-89 (1998)
Canadian Institute for Cybersecurity: URL dataset (ISCX-URL-2016)
Amazon: Alexa Internet, www.alexa.com
Castillio, C.: Web Spam Collections. http://chato.cl/webspam/datasets/uk2007/. Accessed 12 Jul 2021
OpenPhish: Phishing Intelligence. (2020)
Risk Analytics: DNS-BH - Malware Domain Blocklist. (2021)
Breiman, L.: Random Forests. Mach. Learn. 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B. 36, 111–147 (1974)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B. (2022). Using a Machine Learning Model for Malicious URL Type Detection. In: Koucheryavy, Y., Balandin, S., Andreev, S. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. NEW2AN ruSMART 2021 2021. Lecture Notes in Computer Science(), vol 13158. Springer, Cham. https://doi.org/10.1007/978-3-030-97777-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-97777-1_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97776-4
Online ISBN: 978-3-030-97777-1
eBook Packages: Computer ScienceComputer Science (R0)