Skip to main content

Malware Classification by Deep Learning Using Characteristics of Hash Functions

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 450))

Abstract

As the Internet develops, the number of Internet of Things (IoT) devices increases. Simultaneously, the risk of IoT devices being infected with malware also increases. Thus, malware detection has become an important issue. Dynamic analysis logs are effective at detecting malware, but it takes time to collect a large amount of data because the malware must be executed at least once before the logs can be collected. Moreover, dynamic analysis logs are affected by external factors such as the execution environment. A malware detection method that uses a static property analysis log could solve these problems. In this study, deep learning (DL) was used as a machine learning method because DL is effective for large-scale data and can automatically extract features.

Research has been conducted on malware detection using static properties of portable executable (PE) files, establishing that such detection is possible. However, research on malware detection using hash functions such as Fuzzy hash and peHash is lacking. Therefore, we investigated the characteristics of hash values in malware classification. Moreover, when the surface analysis log is viewed in chronological order, that the data are considered have concept drift characteristics. Therefore, we compared malware detection performance using data with the concept drift property. We found that the hash function could be used to prevent performance degradation even with concept drift data. In an experiment combining PE surface information and hash values, concept drift showed the highest performance for certain data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Classifying Malware using Import API and Fuzzy Hashing - impfuzzy. https://blogs.jpcert.or.jp/en/2016/05/classifying-mal-a988.html. Accessed 3 Aug 2021

  2. FFRI Dataset. https://www.iwsec.org/mws/datasets.html. Accessed 3 Aug 2021

  3. pefile. https://github.com/erocarrera/pefile. Accessed 3 Aug 2021

  4. peHash. http://github.com/knowmalware/pehash. Accessed 3 Aug 2021

  5. ssdeep. https://ssdeep-project.github.io/ssdeep/index.html. Accessed 3 Aug 2021

  6. Trend micro locality sensitive hash. https://github.com/trendmicro/tlsh. Accessed 3 Aug 2021

  7. Alhanahnah, M., Lin, Q., Yan, Q., Zhang, N., Chen, Z.: Efficient signature generation for classifying cross-architecture IoT malware. In: 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–9 (2018)

    Google Scholar 

  8. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2006)

    MATH  Google Scholar 

  9. Choi, S.: Combined KNN classification and hierarchical similarity hash for fast malware detection. Appl. Sci. 10(15), 5173 (2020)

    Article  Google Scholar 

  10. Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G.: A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Meth. Eng. 27, 1071–1092 (2019)

    Article  MathSciNet  Google Scholar 

  11. Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 399–418. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40667-1_20

    Chapter  Google Scholar 

  12. Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: Dynamic malware analysis without feature engineering. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 444–455, New York, NY, USA, 2019. Association for Computing Machinery (2019)

    Google Scholar 

  13. Kawaguchi, N., Omote, K.: Malware function classification using APIs in initial behavior. In: 2015 10th Asia Joint Conference on Information Security, pp. 138–144. IEEE (2015)

    Google Scholar 

  14. Kita, K., Uda, R.: Malware subspecies detection method by suffix arrays and machine learning. In: 2021 55th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. IEEE (2021)

    Google Scholar 

  15. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Invest. 3, 91–97 (2006)

    Article  Google Scholar 

  16. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  17. Li, Y., et al.: Experimental study of fuzzy hashing in malware clustering analysis. In: 8th Workshop on Cyber Security Experimentation and Test (CSET 2015), Washington, D.C. USENIX Association, August 2015

    Google Scholar 

  18. Mimura, M., Ito, R.: Applying NLP techniques to malware detection in a practical environment. Int. J. Inf. Secur. 1–13 (2021)

    Google Scholar 

  19. Namanya, A.P., Awan, I.U., Disso, J.P., Younas, M.: Similarity hash based scoring of portable executable files for efficient malware detection in IoT. Future Gener. Comput. Syst. 110, 824–832 (2020)

    Article  Google Scholar 

  20. Ngo, Q.-D., Nguyen, H.-T., Le, V.-H., Nguyen, D.-H.: A survey of IoT malware and detection methods based on static features. ICT Express 6(4), 280–286 (2020)

    Article  Google Scholar 

  21. Noriega, L.: Multilayer perceptron tutorial. School of Computing. Staffordshire University, January 2005

    Google Scholar 

  22. Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., Xiang, Y.: A survey of android malware detection with deep neural models. ACM Comput. Surv. 53(6), 1–36 (2020)

    Article  Google Scholar 

  23. Saxe, J., Berlin, K.: eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. CoRR, abs/1702.08568 (2017)

    Google Scholar 

  24. Wicherski, G.: peHash: a novel approach to fast malware clustering. In: 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET 2009), Boston, MA. USENIX Association, April 2009

    Google Scholar 

  25. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-Sec: deep learning in android malware detection. SIGCOMM Comput. Commun. Rev. 44(4), 371–372 (2014)

    Article  Google Scholar 

  26. Zheng, W., Omote, K.: Robust detection model for portable execution malware. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021)

    Google Scholar 

Download references

Acknowledgments

A part of this research is supported by JST, PRESTO Grant Number JPMJPR1938 and JSPS Grants-in-Aid for Scientific Research JP19H05579.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kensuke Baba or Toshihiro Yamauchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baba, T., Baba, K., Yamauchi, T. (2022). Malware Classification by Deep Learning Using Characteristics of Hash Functions. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 450. Springer, Cham. https://doi.org/10.1007/978-3-030-99587-4_40

Download citation

Publish with us

Policies and ethics