Detecting Malicious URLs Using Lexical Analysis

Mamun, Mohammad Saiful Islam; Rathore, Mohammad Ahmad; Lashkari, Arash Habibi; Stakhanova, Natalia; Ghorbani, Ali A.

doi:10.1007/978-3-319-46298-1_30

Mohammad Saiful Islam Mamun¹⁷,
Mohammad Ahmad Rathore¹⁷,
Arash Habibi Lashkari¹⁷,
Natalia Stakhanova¹⁷ &
…
Ali A. Ghorbani¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9955))

Included in the following conference series:

International Conference on Network and System Security

2638 Accesses
58 Citations

Abstract

The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for mostly blacklisting of malicious URLs. While successful in protecting users from known malicious domains, this approach only solves part of the problem. The new malicious URLs that sprang up all over the web in masses commonly get a head start in this race. Besides that Alexa ranked trusted websites may convey compromised fraudulent URLs called defacement URL. In this work, we explore a lightweight approach to detection and categorization of the malicious URLs according to their attack type. We show that lexical analysis is effective and efficient for proactive detection of these URLs. We provide the set of sufficient features necessary for accurate categorization and evaluate the accuracy of the approach on a set of over 110,000 URLs. We also study the effect of the obfuscation techniques on malicious URLs to figure out the type of obfuscation technique targeted at specific type of malicious URL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Using web content as features requires downloading and analysis of page contents. Moreover inspecting millions of URL and its contents per unit of time may create a bottleneck.
2.
Access to malicious webpage may cause risk since such webpages may contain malicious content such as Javascript functions.
3.
URLs which originating from pages that are written in server side scripting languages, often have arguments [3]. The longest variable value length from arguments of URL is calculated.

References

Google Safe Browsing Transparency Report (2015). www.google.com/transparencyreport/safebrowsing/
Su, K.-W., et al.: Suspicious URL filtering based on logistic regression with multi-view analysis. In: 8th Asia Joint Conference on Information Security (Asia JCIS). IEEE (2013)
Google Scholar
Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: Proceedings IEEE, INFOCOM. IEEE (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Thomas, K., et al.: Design and evaluation of a real-time URL spam filtering service. In: Proceeding of the IEEE Symposium on Security and Privacy (SP) (2011)
Google Scholar
Ma, J., et al.: Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM (2009)
Google Scholar
Nunan, A.E., et al.: Automatic classification of cross-site scripting in web pages using document-based and URL-based features. In: IEEE Symposium on Computers and Communications (ISCC) (2012)
Google Scholar
Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings WebApps (2011)
Google Scholar
Huang, D., Kai, X., Pei, J.: Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17(6), 1375–1394 (2014)
Article Google Scholar
Chu, W., et al.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. In: IEEE International Conference on Communications (ICC) (2013)
Google Scholar
Xu, L., et al.: Cross-layer detection of malicious websites. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy. ACM (2013)
Google Scholar
Garera, S., et al.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM Workshop on Recurring Malcode (2007)
Google Scholar
Radu, Vasile: Application. In: Radu, Vasile (ed.) Stochastic Modeling of Thermal Fatigue Crack Growth. ACM, vol. 1, pp. 63–70. Springer, Heidelberg (2015)
Google Scholar
Kevin, M.D., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Ex-ploits and Emergent Threats (2008)
Google Scholar
Ma, J., et al.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2009)
Google Scholar
Davide, C., et al.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web. ACM (2011)
Google Scholar
Xiang, G., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)
Article Google Scholar
Abdelhamid, N., Aladdin, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41(3), 5948–5959 (2014)
Article Google Scholar
Eshete, B., Villafiorita, A., Binspect, K.W.: Holistic Analysis and Detection of Malicious Web Pages. Security and Privacy in Communication Networks. Springer, Heidelberg (2012)
Google Scholar
Cao, C., Caverlee, J.: Behavioral detection of spam URL sharing: posting patterns versus click patterns. In: IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2014)
Google Scholar
Lin, M.-S., et al.: Malicious URL filtering- a big data application. IEEE International Conference on Big Data (2013)
Google Scholar
Enrico, S., Bartoli, A., Medvet, E.: Detection of hidden fraudulent urls within trusted sites using lexical features. In: Proceeding 18th International Conference on Availability, Reliability and Security (ARES). IEEE (2013)
Google Scholar
Garera, S., Provos, N., Chew, M., Rubin, A.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM workshop on Recurring malcode, pp. 1–8. ACM (2007)
Google Scholar
WEBSPAM-UK2007 dataset. http://chato.cl/webspam/datasets/uk2007/
Malware domain dataset. http://www.malwaredomains.com/
OpenPhish dataset. https://openphish.com/
Davanzo, M., Bartoli, A.: Anomaly detection techniques for a web defacement monitoring service. Expert Syst. Appl. (ESWA) 38(10), 12521–12530 (2011)
Article Google Scholar
Zone-h, unrestricted information. http://www.zone-h.org/
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)
Article Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

University of New Brunswick, Fredericton, NB, Canada
Mohammad Saiful Islam Mamun, Mohammad Ahmad Rathore, Arash Habibi Lashkari, Natalia Stakhanova & Ali A. Ghorbani

Authors

Mohammad Saiful Islam Mamun
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ahmad Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Arash Habibi Lashkari
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Stakhanova
View author publications
You can also search for this author in PubMed Google Scholar
Ali A. Ghorbani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Saiful Islam Mamun .

Editor information

Editors and Affiliations

Central China Normal University , Wuhan, China
Jiageng Chen
Università degli Studi di Milano , Crema (CR), Italy
Vincenzo Piuri
Osaka University , Osaka, Japan
Chunhua Su
Columbia University , New York, New York, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamun, M.S.I., Rathore, M.A., Lashkari, A.H., Stakhanova, N., Ghorbani, A.A. (2016). Detecting Malicious URLs Using Lexical Analysis. In: Chen, J., Piuri, V., Su, C., Yung, M. (eds) Network and System Security. NSS 2016. Lecture Notes in Computer Science(), vol 9955. Springer, Cham. https://doi.org/10.1007/978-3-319-46298-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-46298-1_30
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46297-4
Online ISBN: 978-3-319-46298-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics