Skip to main content

Lexical Mining of Malicious URLs for Classifying Android Malware

  • Conference paper
  • First Online:

Abstract

The prevalence of mobile malware has become a growing issue given the tight integration of mobile systems with our daily life. Most malware programs use URLs inside network traffic to forward commands to launch malicious activities. Therefore, the detection of malicious URLs can be essential in deterring such malicious activities. Traditional methods construct blacklists with verified URLs to identify malicious URLs, but their effectiveness is impaired by unknown malicious URLs. Recently, machine learning-based methods have been proposed for malware detection with improved performance. In this paper, we propose a novel URL detection method based on Floating Centroids Method (FCM), which integrates supervised classification and unsupervised clustering in a coherent manner. The proposed method uses the lexical features of a URL to effectively identify malicious URLs while grouping similar URLs into the same cluster. Our experimental results show that a URL cluster exhibits unique behavioral patterns that can be used for malware detection with high accuracy. Moreover, the proposed behavioral clustering method facilitates the identification of malicious URL categories and unseen malware variants.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Security threat report 2014. http://www.sophos.com/en-us/medialibrary/PDFs/other/sophossecurity-threat-report-2014.pdf

  2. Wang, L., et al.: Improvement of neural network classifier using floating centroids. Knowl. Inf. Syst. 31(3), 433–454 (2012)

    Article  Google Scholar 

  3. Specification of malicious url 2013. http://www.antiy.net/p/specification-of-malicious-url

  4. Canopy clustering algorithm. https://en.wikipedia.org/wiki/Canopy_clustering_algorithm

  5. Wu, D.J., Mao, C.H., Lee, H.M., Wu, K.P.: Droidmat: android malware detection through manifest and api calls tracing. In: Information Security, pp. 62–69 (2012)

    Google Scholar 

  6. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2014)

    Google Scholar 

  7. Yang, C., Xu, Z., Gu, G., Yegneswaran, V., Porras, P.: DroidMiner: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 163–182. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9_10

    Chapter  Google Scholar 

  8. Yan, L.K., Yin, H.: DroidScope: seamlessly reconstructing the OS and Dalvik semantic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX Conference on Security Symposium, p. 29 (2013)

    Google Scholar 

  9. Rastogi, V., Chen, Y., Enck, W.: AppsPlayground: automatic security analysis of smartphone applications. In: ACM Conference on Data and Application Security and Privacy, pp. 209–220 (2013)

    Google Scholar 

  10. Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 20(1), 1–15 (2016)

    Article  Google Scholar 

  11. Xu, Q., et al.: Automatic generation of mobile app signatures from traffic observations. In: Computer Communications, pp. 1481–1489 (2015)

    Google Scholar 

  12. Wang, S., Chen, Z., Zhang, L., Yan, Q., Yang, B.: Trafficav: an effective and explainable detection of mobile malware behavior using network traffic. In: Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQOS), pp. 1–6 (2016)

    Google Scholar 

  13. Pizzato, L., Rej, T., Chung, T., Koprinska, I., Kay, J.: RECON: a reciprocal recommender for online dating. In: ACM Conference on Recommender Systems, pp. 207–214 (2010)

    Google Scholar 

  14. Wei, X., Neamtiu, I., Faloutsos, M.: Whom does your android app talk to? In: Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2015)

    Google Scholar 

  15. Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43(6), 1–18 (2014)

    Article  Google Scholar 

  16. Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, pp. 1025–1035. ACM (2014)

    Google Scholar 

  17. Android monkey tool. http://developer.android.com/tools/help/monkey.html

  18. Tshark - dump and analyze network traffic. https://www.wireshark.org/docs/man-pages/tshark.html

  19. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  20. PSO tutorial. http://www.swarmintelligence.org/tutorials.php

  21. Virusshare.com - because sharing is caring. https://virusshare.com/

  22. Virustotal. https://www.virustotal.com/

  23. Aranganayagi, S., Thangavel, K.: Clustering categorical data using silhouette coefficient as a relocating measure. In: Conference on Computational Intelligence and Multimedia Applications. International Conference on, vol. 2, pp. 13–17. IEEE (2007)

    Google Scholar 

  24. Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Usenix Conference on Networked Systems Design and Implementation, p. 26 (2010)

    Google Scholar 

  25. Wang, S., Yan, Q., Chen, Z., Yang, B., Zhao, C., Conti, M.: Detecting android malware leveraging text semantics of network flows. IEEE Trans. Inf. Forensics Secur. PP(99), 1 (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61572230, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001 and No. 2018CXGC0706, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404. This work is also supported in part by NSF grant CNS-1566388.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenxiang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S. et al. (2018). Lexical Mining of Malicious URLs for Classifying Android Malware. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01701-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01700-2

  • Online ISBN: 978-3-030-01701-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics