Malware Classification by Deep Learning Using Characteristics of Hash Functions

Baba, Takahiro; Baba, Kensuke; Yamauchi, Toshihiro

doi:10.1007/978-3-030-99587-4_40

Malware Classification by Deep Learning Using Characteristics of Hash Functions

Takahiro Baba¹²,
Kensuke Baba¹³ &
Toshihiro Yamauchi¹⁴

Conference paper
First Online: 31 March 2022

908 Accesses
1 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 450))

Abstract

As the Internet develops, the number of Internet of Things (IoT) devices increases. Simultaneously, the risk of IoT devices being infected with malware also increases. Thus, malware detection has become an important issue. Dynamic analysis logs are effective at detecting malware, but it takes time to collect a large amount of data because the malware must be executed at least once before the logs can be collected. Moreover, dynamic analysis logs are affected by external factors such as the execution environment. A malware detection method that uses a static property analysis log could solve these problems. In this study, deep learning (DL) was used as a machine learning method because DL is effective for large-scale data and can automatically extract features.

Research has been conducted on malware detection using static properties of portable executable (PE) files, establishing that such detection is possible. However, research on malware detection using hash functions such as Fuzzy hash and peHash is lacking. Therefore, we investigated the characteristics of hash values in malware classification. Moreover, when the surface analysis log is viewed in chronological order, that the data are considered have concept drift characteristics. Therefore, we compared malware detection performance using data with the concept drift property. We found that the hash function could be used to prevent performance degradation even with concept drift data. In an experiment combining PE surface information and hash values, concept drift showed the highest performance for certain data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Classifying Malware using Import API and Fuzzy Hashing - impfuzzy. https://blogs.jpcert.or.jp/en/2016/05/classifying-mal-a988.html. Accessed 3 Aug 2021
FFRI Dataset. https://www.iwsec.org/mws/datasets.html. Accessed 3 Aug 2021
pefile. https://github.com/erocarrera/pefile. Accessed 3 Aug 2021
peHash. http://github.com/knowmalware/pehash. Accessed 3 Aug 2021
ssdeep. https://ssdeep-project.github.io/ssdeep/index.html. Accessed 3 Aug 2021
Trend micro locality sensitive hash. https://github.com/trendmicro/tlsh. Accessed 3 Aug 2021
Alhanahnah, M., Lin, Q., Yan, Q., Zhang, N., Chen, Z.: Efficient signature generation for classifying cross-architecture IoT malware. In: 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–9 (2018)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2006)
MATH Google Scholar
Choi, S.: Combined KNN classification and hierarchical similarity hash for fast malware detection. Appl. Sci. 10(15), 5173 (2020)
Article Google Scholar
Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G.: A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Meth. Eng. 27, 1071–1092 (2019)
Article MathSciNet Google Scholar
Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 399–418. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40667-1_20
Chapter Google Scholar
Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: Dynamic malware analysis without feature engineering. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 444–455, New York, NY, USA, 2019. Association for Computing Machinery (2019)
Google Scholar
Kawaguchi, N., Omote, K.: Malware function classification using APIs in initial behavior. In: 2015 10th Asia Joint Conference on Information Security, pp. 138–144. IEEE (2015)
Google Scholar
Kita, K., Uda, R.: Malware subspecies detection method by suffix arrays and machine learning. In: 2021 55th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. IEEE (2021)
Google Scholar
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Invest. 3, 91–97 (2006)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, Y., et al.: Experimental study of fuzzy hashing in malware clustering analysis. In: 8th Workshop on Cyber Security Experimentation and Test (CSET 2015), Washington, D.C. USENIX Association, August 2015
Google Scholar
Mimura, M., Ito, R.: Applying NLP techniques to malware detection in a practical environment. Int. J. Inf. Secur. 1–13 (2021)
Google Scholar
Namanya, A.P., Awan, I.U., Disso, J.P., Younas, M.: Similarity hash based scoring of portable executable files for efficient malware detection in IoT. Future Gener. Comput. Syst. 110, 824–832 (2020)
Article Google Scholar
Ngo, Q.-D., Nguyen, H.-T., Le, V.-H., Nguyen, D.-H.: A survey of IoT malware and detection methods based on static features. ICT Express 6(4), 280–286 (2020)
Article Google Scholar
Noriega, L.: Multilayer perceptron tutorial. School of Computing. Staffordshire University, January 2005
Google Scholar
Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., Xiang, Y.: A survey of android malware detection with deep neural models. ACM Comput. Surv. 53(6), 1–36 (2020)
Article Google Scholar
Saxe, J., Berlin, K.: eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. CoRR, abs/1702.08568 (2017)
Google Scholar
Wicherski, G.: peHash: a novel approach to fast malware clustering. In: 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET 2009), Boston, MA. USENIX Association, April 2009
Google Scholar
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-Sec: deep learning in android malware detection. SIGCOMM Comput. Commun. Rev. 44(4), 371–372 (2014)
Article Google Scholar
Zheng, W., Omote, K.: Robust detection model for portable execution malware. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021)
Google Scholar

Download references

Acknowledgments

A part of this research is supported by JST, PRESTO Grant Number JPMJPR1938 and JSPS Grants-in-Aid for Scientific Research JP19H05579.

Author information

Authors and Affiliations

Graduate School of Natural Science and Technology, Okayama University, Okayama, Japan
Takahiro Baba
Cyber-Physical Engineering Informatics Research Core, Okayama University, Okayama, Japan
Kensuke Baba
Graduate School of Natural Science and Technology, Okayama University/JST, PRESTO, Okayama, Japan
Toshihiro Yamauchi

Authors

Takahiro Baba
View author publications
You can also search for this author in PubMed Google Scholar
Kensuke Baba
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Yamauchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kensuke Baba or Toshihiro Yamauchi .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
University of Technology Sydney, Sydney, NSW, Australia
Farookh Hussain
Faculty of Bussiness Administration, Rissho University, Tokyo, Japan
Tomoya Enokido

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baba, T., Baba, K., Yamauchi, T. (2022). Malware Classification by Deep Learning Using Characteristics of Hash Functions. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 450. Springer, Cham. https://doi.org/10.1007/978-3-030-99587-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-99587-4_40
Published: 31 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99586-7
Online ISBN: 978-3-030-99587-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics