Skip to main content

Using a Machine Learning Model for Malicious URL Type Detection

  • Conference paper
  • First Online:
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (NEW2AN 2021, ruSMART 2021)

Abstract

The world wide web, beyond its benefits, has also become a major platform for online criminal activities. Traditional protection methods against malicious URLs, such as blacklisting, remain a valid alternative, but cannot detect unknown sites, hence new methods are being developed for automatic detection, using machine learning approaches. This paper strengthens the existing state of the art by proposing an alternative machine learning approach, that uses a set of 14 lexical and host-based features but focuses on the typical mechanisms employed by malicious URLs. The proposed method employs random forest and decision tree as core mechanisms and is evaluated on a combined benign and malicious URL dataset, which indicates an accuracy of over 97%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. We Are Social, Hoootsuite: Digital 2021 Global Overview Report. Datareportal.com. 299 (2021)

    Google Scholar 

  2. Google: Google: Transparency Report. Google Transpar. Rep. (2010)

    Google Scholar 

  3. Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proc. IEEE INFOCOM. (2010). https://doi.org/10.1109/INFCOM.2010.5462216

    Article  Google Scholar 

  4. Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. LEET 2010 - 3rd USENIX Work. Large-Scale Exploit. Emergent Threat. Botnets, Spyware, Worms, More. (2010)

    Google Scholar 

  5. Sinha, S., Bailey, M., Jahanian, F.: Shades of Grey: on the effectiveness of reputation-based blacklists. In: 3rd International Conference Malicious Unwanted Software, MALWARE 2008. 57–64 (2008). https://doi.org/10.1109/MALWARE.2008.4690858

  6. Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey. (2017)

    Google Scholar 

  7. Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014). https://doi.org/10.1016/j.eswa.2014.03.019

    Article  Google Scholar 

  8. Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-centric Comp. Inf. Sci. 6, (2016). https://doi.org/10.1186/s13673-016-0064-3

  9. Aung, E.S., Yamana, H.: URL-based phishing detection using the entropy of non- A lphanumeric characters. ACM Int. Conf. Proceeding Ser. (2019). https://doi.org/10.1145/3366030.3366064

    Article  Google Scholar 

  10. Ravi, R., Shillare, A.A., Bhoir, P.P., Charumathi, K.S.: URL based email phishing detection application. Int. Res. J. Eng. Technol. 8, 335–360 (2021)

    Google Scholar 

  11. Verizon: Data Breach Investigations Report (DBIR). Comput. Fraud Secur. 12, 8 (2019)

    Google Scholar 

  12. Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. J. 48, 729–734 (2016). https://doi.org/10.1016/j.asoc.2016.08.005

    Article  Google Scholar 

  13. Aung, E.S., Zan, T., Yamana, H.: A survey of URL-based phishing detection. pp. 1–8 (2019)

    Google Scholar 

  14. Kumi, S., Lim, C., Lee, S.G.: Malicious url detection based on associative classification. Entropy 23, 1–12 (2021). https://doi.org/10.3390/e23020182

    Article  Google Scholar 

  15. Shantanu, D., Janet, B., Kumar, R.J.A.: Malicious URL detection: a comparative study. In: Proceedings of International Conference Artificial Intelligence Smart System ICAIS 2021, pp. 1147–1151 (2021). https://doi.org/10.1109/ICAIS50930.2021.9396014

  16. Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference (TrustCom/BigDataSE), pp. 737–743 (2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00107

  17. Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: DURLD: malicious URL detection using deep learning-based character level representations. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_21

  18. Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.W.: Tweet2Vec: character-based distributed representations for social media. In: 54th Annual Meeting Association Computer Linguistics ACL 2016, pp. 269–274 (2016). https://doi.org/10.18653/v1/p16-2044

  19. Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: AISec 2016 – Proceedings of 2016 ACM Work. Artificial Intelligence Security co-located with CCS 2016, pp. 13–21 (2016). https://doi.org/10.1145/2996758.2996767

  20. Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Galinina, O., Andreev, S., Balandin, S., Koucheryavy, Y. (eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. LNCS, vol. 12525. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65726-0_21

  21. Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020). https://doi.org/10.22667/JOWUA.2020.12.31.031

  22. Li, T., Kou, G., Peng, Y.: Improving malicious URLs detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, (2020). https://doi.org/10.1016/j.is.2020.101494

  23. Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious URL detection using supervised machine learning techniques. ACM Int. Conf. Proceeding Ser. (2020). https://doi.org/10.1145/3433174.3433592

    Article  Google Scholar 

  24. Urcuqui, C.: Malicious and Benign Websites dataset. https://www.kaggle.com/xwolf12/malicious-and-benign-websites. Accessed 12 Jul 2021

  25. Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. WebApps. 11 (2011)

    Google Scholar 

  26. Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using C4.5 decision tree. Southeast Eur. J. Soft Comput. 5 (2016). https://doi.org/10.21533/scjournal.v5i1.109

  27. Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: EINSPECT: evolution-guided analysis and detection of malicious web pages. In: Proceedings of International Computing Software Applied Conference, pp. 375–380 (2013). https://doi.org/10.1109/COMPSAC.2013.63

  28. Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. IEEE Int. Conf. Commun. 1990–1994 (2013). https://doi.org/10.1109/ICC.2013.6654816

  29. Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: A fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference World Wide Web, WWW 2011. pp. 197–206 (2011). https://doi.org/10.1145/1963405.1963436

  30. Murthy, S. K.: Automatic construction of decision trees from data: a multidisciplinary survey. Data Min. Knowl. Discov. 2(4), 345-89 (1998)

    Google Scholar 

  31. Canadian Institute for Cybersecurity: URL dataset (ISCX-URL-2016)

    Google Scholar 

  32. Amazon: Alexa Internet, www.alexa.com

  33. Castillio, C.: Web Spam Collections. http://chato.cl/webspam/datasets/uk2007/. Accessed 12 Jul 2021

  34. OpenPhish: Phishing Intelligence. (2020)

    Google Scholar 

  35. Risk Analytics: DNS-BH - Malware Domain Blocklist. (2021)

    Google Scholar 

  36. Breiman, L.: Random Forests. Mach. Learn. 5–32 (2001). https://doi.org/10.1023/A:1010933404324

  37. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B. 36, 111–147 (1974)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ievgeniia Kuzminykh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B. (2022). Using a Machine Learning Model for Malicious URL Type Detection. In: Koucheryavy, Y., Balandin, S., Andreev, S. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. NEW2AN ruSMART 2021 2021. Lecture Notes in Computer Science(), vol 13158. Springer, Cham. https://doi.org/10.1007/978-3-030-97777-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-97777-1_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-97776-4

  • Online ISBN: 978-3-030-97777-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics