Lexical Mining of Malicious URLs for Classifying Android Malware

Wang, Shanshan; Yan, Qiben; Chen, Zhenxiang; Wang, Lin; Spolaor, Riccardo; Yang, Bo; Conti, Mauro

doi:10.1007/978-3-030-01701-9_14

Lexical Mining of Malicious URLs for Classifying Android Malware

Shanshan Wang¹⁹,
Qiben Yan²⁰,
Zhenxiang Chen¹⁹,
Lin Wang¹⁹,
Riccardo Spolaor²¹,
Bo Yang¹⁹ &
…
Mauro Conti²¹

Conference paper
First Online: 29 December 2018

1348 Accesses
3 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 254))

Abstract

The prevalence of mobile malware has become a growing issue given the tight integration of mobile systems with our daily life. Most malware programs use URLs inside network traffic to forward commands to launch malicious activities. Therefore, the detection of malicious URLs can be essential in deterring such malicious activities. Traditional methods construct blacklists with verified URLs to identify malicious URLs, but their effectiveness is impaired by unknown malicious URLs. Recently, machine learning-based methods have been proposed for malware detection with improved performance. In this paper, we propose a novel URL detection method based on Floating Centroids Method (FCM), which integrates supervised classification and unsupervised clustering in a coherent manner. The proposed method uses the lexical features of a URL to effectively identify malicious URLs while grouping similar URLs into the same cluster. Our experimental results show that a URL cluster exhibits unique behavioral patterns that can be used for malware detection with high accuracy. Moreover, the proposed behavioral clustering method facilitates the identification of malicious URL categories and unseen malware variants.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Security threat report 2014. http://www.sophos.com/en-us/medialibrary/PDFs/other/sophossecurity-threat-report-2014.pdf
Wang, L., et al.: Improvement of neural network classifier using floating centroids. Knowl. Inf. Syst. 31(3), 433–454 (2012)
Article Google Scholar
Specification of malicious url 2013. http://www.antiy.net/p/specification-of-malicious-url
Canopy clustering algorithm. https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
Wu, D.J., Mao, C.H., Lee, H.M., Wu, K.P.: Droidmat: android malware detection through manifest and api calls tracing. In: Information Security, pp. 62–69 (2012)
Google Scholar
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2014)
Google Scholar
Yang, C., Xu, Z., Gu, G., Yegneswaran, V., Porras, P.: DroidMiner: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 163–182. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9_10
Chapter Google Scholar
Yan, L.K., Yin, H.: DroidScope: seamlessly reconstructing the OS and Dalvik semantic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX Conference on Security Symposium, p. 29 (2013)
Google Scholar
Rastogi, V., Chen, Y., Enck, W.: AppsPlayground: automatic security analysis of smartphone applications. In: ACM Conference on Data and Application Security and Privacy, pp. 209–220 (2013)
Google Scholar
Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 20(1), 1–15 (2016)
Article Google Scholar
Xu, Q., et al.: Automatic generation of mobile app signatures from traffic observations. In: Computer Communications, pp. 1481–1489 (2015)
Google Scholar
Wang, S., Chen, Z., Zhang, L., Yan, Q., Yang, B.: Trafficav: an effective and explainable detection of mobile malware behavior using network traffic. In: Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQOS), pp. 1–6 (2016)
Google Scholar
Pizzato, L., Rej, T., Chung, T., Koprinska, I., Kay, J.: RECON: a reciprocal recommender for online dating. In: ACM Conference on Recommender Systems, pp. 207–214 (2010)
Google Scholar
Wei, X., Neamtiu, I., Faloutsos, M.: Whom does your android app talk to? In: Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2015)
Google Scholar
Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43(6), 1–18 (2014)
Article Google Scholar
Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, pp. 1025–1035. ACM (2014)
Google Scholar
Android monkey tool. http://developer.android.com/tools/help/monkey.html
Tshark - dump and analyze network traffic. https://www.wireshark.org/docs/man-pages/tshark.html
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
PSO tutorial. http://www.swarmintelligence.org/tutorials.php
Virusshare.com - because sharing is caring. https://virusshare.com/
Virustotal. https://www.virustotal.com/
Aranganayagi, S., Thangavel, K.: Clustering categorical data using silhouette coefficient as a relocating measure. In: Conference on Computational Intelligence and Multimedia Applications. International Conference on, vol. 2, pp. 13–17. IEEE (2007)
Google Scholar
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Usenix Conference on Networked Systems Design and Implementation, p. 26 (2010)
Google Scholar
Wang, S., Yan, Q., Chen, Z., Yang, B., Zhao, C., Conti, M.: Detecting android malware leveraging text semantics of network flows. IEEE Trans. Inf. Forensics Secur. PP(99), 1 (2017)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61572230, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001 and No. 2018CXGC0706, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404. This work is also supported in part by NSF grant CNS-1566388.

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
Shanshan Wang, Zhenxiang Chen, Lin Wang & Bo Yang
Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
Qiben Yan
Department of Mathematics, University of Padova, Padua, Italy
Riccardo Spolaor & Mauro Conti

Authors

Shanshan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiben Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Spolaor
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Conti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenxiang Chen .

Editor information

Editors and Affiliations

Klaus Advanced Computing Building, Georgia Institute of Technology, Atlanta, GA, USA
Raheem Beyah
Singapore Management University, Singapore, Singapore
Bing Chang
School of Information Systems, Singapore Management University, Singapore, Singapore
Yingjiu Li
Pennsylvania State University, University Park, PA, USA
Sencun Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S. et al. (2018). Lexical Mining of Malicious URLs for Classifying Android Malware. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-01701-9_14
Published: 29 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01700-2
Online ISBN: 978-3-030-01701-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics