Using a Machine Learning Model for Malicious URL Type Detection

Tung, Suet Ping; Wong, Ka Yan; Kuzminykh, Ievgeniia; Bakhshi, Taimur; Ghita, Bogdan

doi:10.1007/978-3-030-97777-1_41

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 13158))

Included in the following conference series:

744 Accesses

Abstract

The world wide web, beyond its benefits, has also become a major platform for online criminal activities. Traditional protection methods against malicious URLs, such as blacklisting, remain a valid alternative, but cannot detect unknown sites, hence new methods are being developed for automatic detection, using machine learning approaches. This paper strengthens the existing state of the art by proposing an alternative machine learning approach, that uses a set of 14 lexical and host-based features but focuses on the typical mechanisms employed by malicious URLs. The proposed method employs random forest and decision tree as core mechanisms and is evaluated on a combined benign and malicious URL dataset, which indicates an accuracy of over 97%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Malicious URL Detection Using Machine Learning Techniques

Machine Learning for Malicious URL Detection

A Systemic Review of Machine Learning Approaches for Malicious URL Detection

References

We Are Social, Hoootsuite: Digital 2021 Global Overview Report. Datareportal.com. 299 (2021)
Google Scholar
Google: Google: Transparency Report. Google Transpar. Rep. (2010)
Google Scholar
Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proc. IEEE INFOCOM. (2010). https://doi.org/10.1109/INFCOM.2010.5462216
Article Google Scholar
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. LEET 2010 - 3rd USENIX Work. Large-Scale Exploit. Emergent Threat. Botnets, Spyware, Worms, More. (2010)
Google Scholar
Sinha, S., Bailey, M., Jahanian, F.: Shades of Grey: on the effectiveness of reputation-based blacklists. In: 3rd International Conference Malicious Unwanted Software, MALWARE 2008. 57–64 (2008). https://doi.org/10.1109/MALWARE.2008.4690858
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey. (2017)
Google Scholar
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014). https://doi.org/10.1016/j.eswa.2014.03.019
Article Google Scholar
Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-centric Comp. Inf. Sci. 6, (2016). https://doi.org/10.1186/s13673-016-0064-3
Aung, E.S., Yamana, H.: URL-based phishing detection using the entropy of non- A lphanumeric characters. ACM Int. Conf. Proceeding Ser. (2019). https://doi.org/10.1145/3366030.3366064
Article Google Scholar
Ravi, R., Shillare, A.A., Bhoir, P.P., Charumathi, K.S.: URL based email phishing detection application. Int. Res. J. Eng. Technol. 8, 335–360 (2021)
Google Scholar
Verizon: Data Breach Investigations Report (DBIR). Comput. Fraud Secur. 12, 8 (2019)
Google Scholar
Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. J. 48, 729–734 (2016). https://doi.org/10.1016/j.asoc.2016.08.005
Article Google Scholar
Aung, E.S., Zan, T., Yamana, H.: A survey of URL-based phishing detection. pp. 1–8 (2019)
Google Scholar
Kumi, S., Lim, C., Lee, S.G.: Malicious url detection based on associative classification. Entropy 23, 1–12 (2021). https://doi.org/10.3390/e23020182
Article Google Scholar
Shantanu, D., Janet, B., Kumar, R.J.A.: Malicious URL detection: a comparative study. In: Proceedings of International Conference Artificial Intelligence Smart System ICAIS 2021, pp. 1147–1151 (2021). https://doi.org/10.1109/ICAIS50930.2021.9396014
Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference (TrustCom/BigDataSE), pp. 737–743 (2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00107
Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: DURLD: malicious URL detection using deep learning-based character level representations. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_21
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.W.: Tweet2Vec: character-based distributed representations for social media. In: 54th Annual Meeting Association Computer Linguistics ACL 2016, pp. 269–274 (2016). https://doi.org/10.18653/v1/p16-2044
Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: AISec 2016 – Proceedings of 2016 ACM Work. Artificial Intelligence Security co-located with CCS 2016, pp. 13–21 (2016). https://doi.org/10.1145/2996758.2996767
Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Galinina, O., Andreev, S., Balandin, S., Koucheryavy, Y. (eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. LNCS, vol. 12525. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65726-0_21
Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020). https://doi.org/10.22667/JOWUA.2020.12.31.031
Li, T., Kou, G., Peng, Y.: Improving malicious URLs detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, (2020). https://doi.org/10.1016/j.is.2020.101494
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious URL detection using supervised machine learning techniques. ACM Int. Conf. Proceeding Ser. (2020). https://doi.org/10.1145/3433174.3433592
Article Google Scholar
Urcuqui, C.: Malicious and Benign Websites dataset. https://www.kaggle.com/xwolf12/malicious-and-benign-websites. Accessed 12 Jul 2021
Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. WebApps. 11 (2011)
Google Scholar
Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using C4.5 decision tree. Southeast Eur. J. Soft Comput. 5 (2016). https://doi.org/10.21533/scjournal.v5i1.109
Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: EINSPECT: evolution-guided analysis and detection of malicious web pages. In: Proceedings of International Computing Software Applied Conference, pp. 375–380 (2013). https://doi.org/10.1109/COMPSAC.2013.63
Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. IEEE Int. Conf. Commun. 1990–1994 (2013). https://doi.org/10.1109/ICC.2013.6654816
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: A fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference World Wide Web, WWW 2011. pp. 197–206 (2011). https://doi.org/10.1145/1963405.1963436
Murthy, S. K.: Automatic construction of decision trees from data: a multidisciplinary survey. Data Min. Knowl. Discov. 2(4), 345-89 (1998)
Google Scholar
Canadian Institute for Cybersecurity: URL dataset (ISCX-URL-2016)
Google Scholar
Amazon: Alexa Internet, www.alexa.com
Castillio, C.: Web Spam Collections. http://chato.cl/webspam/datasets/uk2007/. Accessed 12 Jul 2021
OpenPhish: Phishing Intelligence. (2020)
Google Scholar
Risk Analytics: DNS-BH - Malware Domain Blocklist. (2021)
Google Scholar
Breiman, L.: Random Forests. Mach. Learn. 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B. 36, 111–147 (1974)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Plymouth, Drake Circus, Plymouth, PL4 8AA, UK
Suet Ping Tung & Bogdan Ghita
HKU School of Professional and Continuing Education, Kowloon Bay, Kowloon, Hong Kong
Suet Ping Tung & Ka Yan Wong
King’s College London, Strand, London, WC2R 2LS, UK
Ievgeniia Kuzminykh
Kharkiv National University of Radio Electronics, 14 Nauki avenue, Kharkiv, Ukraine
Ievgeniia Kuzminykh
FAST National University of Computer and Emerging Sciences, Lahore, Pakistan
Taimur Bakhshi

Authors

Suet Ping Tung
View author publications
You can also search for this author in PubMed Google Scholar
Ka Yan Wong
View author publications
You can also search for this author in PubMed Google Scholar
Ievgeniia Kuzminykh
View author publications
You can also search for this author in PubMed Google Scholar
Taimur Bakhshi
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Ghita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ievgeniia Kuzminykh .

Editor information

Editors and Affiliations

Tampere University, Tampere, Finland
Yevgeni Koucheryavy
FRUCT Oy, Helsinki, Finland
Sergey Balandin
Tampere University, Tampere, Finland
Sergey Andreev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B. (2022). Using a Machine Learning Model for Malicious URL Type Detection. In: Koucheryavy, Y., Balandin, S., Andreev, S. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. NEW2AN ruSMART 2021 2021. Lecture Notes in Computer Science(), vol 13158. Springer, Cham. https://doi.org/10.1007/978-3-030-97777-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-97777-1_41
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97776-4
Online ISBN: 978-3-030-97777-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics