Skip to main content

Detecting Malicious URLs Using Lexical Analysis

  • Conference paper
  • First Online:
Network and System Security (NSS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9955))

Included in the following conference series:

Abstract

The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for mostly blacklisting of malicious URLs. While successful in protecting users from known malicious domains, this approach only solves part of the problem. The new malicious URLs that sprang up all over the web in masses commonly get a head start in this race. Besides that Alexa ranked trusted websites may convey compromised fraudulent URLs called defacement URL. In this work, we explore a lightweight approach to detection and categorization of the malicious URLs according to their attack type. We show that lexical analysis is effective and efficient for proactive detection of these URLs. We provide the set of sufficient features necessary for accurate categorization and evaluate the accuracy of the approach on a set of over 110,000 URLs. We also study the effect of the obfuscation techniques on malicious URLs to figure out the type of obfuscation technique targeted at specific type of maliciousĀ URL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Using web content as features requires downloading and analysis of page contents. Moreover inspecting millions of URL and its contents per unit of time may create a bottleneck.

  2. 2.

    Access to malicious webpage may cause risk since such webpages may contain malicious content such as Javascript functions.

  3. 3.

    URLs which originating from pages that are written in server side scripting languages, often have arguments [3]. The longest variable value length from arguments of URL is calculated.

References

  1. Google Safe Browsing Transparency Report (2015). www.google.com/transparencyreport/safebrowsing/

  2. Su, K.-W., et al.: Suspicious URL filtering based on logistic regression with multi-view analysis. In: 8th Asia Joint Conference on Information Security (Asia JCIS). IEEE (2013)

    Google ScholarĀ 

  3. Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: Proceedings IEEE, INFOCOM. IEEE (2011)

    Google ScholarĀ 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5ā€“32 (2001)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  5. Thomas, K., et al.: Design and evaluation of a real-time URL spam filtering service. In: Proceeding of the IEEE Symposium on Security and Privacy (SP) (2011)

    Google ScholarĀ 

  6. Ma, J., et al.: Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM (2009)

    Google ScholarĀ 

  7. Nunan, A.E., et al.: Automatic classification of cross-site scripting in web pages using document-based and URL-based features. In: IEEE Symposium on Computers and Communications (ISCC) (2012)

    Google ScholarĀ 

  8. Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings WebApps (2011)

    Google ScholarĀ 

  9. Huang, D., Kai, X., Pei, J.: Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17(6), 1375ā€“1394 (2014)

    ArticleĀ  Google ScholarĀ 

  10. Chu, W., et al.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. In: IEEE International Conference on Communications (ICC) (2013)

    Google ScholarĀ 

  11. Xu, L., et al.: Cross-layer detection of malicious websites. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy. ACM (2013)

    Google ScholarĀ 

  12. Garera, S., et al.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM Workshop on Recurring Malcode (2007)

    Google ScholarĀ 

  13. Radu, Vasile: Application. In: Radu, Vasile (ed.) Stochastic Modeling of Thermal Fatigue Crack Growth. ACM, vol. 1, pp. 63ā€“70. Springer, Heidelberg (2015)

    Google ScholarĀ 

  14. Kevin, M.D., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Ex-ploits and Emergent Threats (2008)

    Google ScholarĀ 

  15. Ma, J., et al.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2009)

    Google ScholarĀ 

  16. Davide, C., et al.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web. ACM (2011)

    Google ScholarĀ 

  17. Xiang, G., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)

    ArticleĀ  Google ScholarĀ 

  18. Abdelhamid, N., Aladdin, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41(3), 5948ā€“5959 (2014)

    ArticleĀ  Google ScholarĀ 

  19. Eshete, B., Villafiorita, A., Binspect, K.W.: Holistic Analysis and Detection of Malicious Web Pages. Security and Privacy in Communication Networks. Springer, Heidelberg (2012)

    Google ScholarĀ 

  20. Cao, C., Caverlee, J.: Behavioral detection of spam URL sharing: posting patterns versus click patterns. In: IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2014)

    Google ScholarĀ 

  21. Lin, M.-S., et al.: Malicious URL filtering- a big data application. IEEE International Conference on Big Data (2013)

    Google ScholarĀ 

  22. Enrico, S., Bartoli, A., Medvet, E.: Detection of hidden fraudulent urls within trusted sites using lexical features. In: Proceeding 18th International Conference on Availability, Reliability and Security (ARES). IEEE (2013)

    Google ScholarĀ 

  23. Garera, S., Provos, N., Chew, M., Rubin, A.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM workshop on Recurring malcode, pp. 1ā€“8. ACM (2007)

    Google ScholarĀ 

  24. WEBSPAM-UK2007 dataset. http://chato.cl/webspam/datasets/uk2007/

  25. Malware domain dataset. http://www.malwaredomains.com/

  26. OpenPhish dataset. https://openphish.com/

  27. Davanzo, M., Bartoli, A.: Anomaly detection techniques for a web defacement monitoring service. Expert Syst. Appl. (ESWA) 38(10), 12521ā€“12530 (2011)

    ArticleĀ  Google ScholarĀ 

  28. Zone-h, unrestricted information. http://www.zone-h.org/

  29. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google ScholarĀ 

  30. Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580ā€“585 (1985)

    ArticleĀ  Google ScholarĀ 

  31. Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18ā€“22 (2002)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Saiful Islam Mamun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2016 Springer International Publishing AG

About this paper

Cite this paper

Mamun, M.S.I., Rathore, M.A., Lashkari, A.H., Stakhanova, N., Ghorbani, A.A. (2016). Detecting Malicious URLs Using Lexical Analysis. In: Chen, J., Piuri, V., Su, C., Yung, M. (eds) Network and System Security. NSS 2016. Lecture Notes in Computer Science(), vol 9955. Springer, Cham. https://doi.org/10.1007/978-3-319-46298-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46298-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46297-4

  • Online ISBN: 978-3-319-46298-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics