Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach

Liu, Mingchang; Sachidananda, Vinay; Peng, Hongyi; Patil, Rajendra; Muneeswaran, Sivaanandh; Gurusamy, Mohan

doi:10.1007/978-3-031-15777-6_29

Mingchang Liu¹¹,
Vinay Sachidananda¹¹,
Hongyi Peng¹¹,
Rajendra Patil¹¹,
Sivaanandh Muneeswaran¹¹ &
…
Mohan Gurusamy¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13407))

Included in the following conference series:

International Conference on Information and Communications Security

1238 Accesses

Abstract

In this paper, we propose – Peekaboo – a multiple feature-based lenient hybrid analysis for malware detection and classification. Our solution uses application programming interface (API) calls and operational codes (opcodes) extracted dynamically and statically as the behavioral features, and uses Recurrent Neural Network (RNN) to model both static and dynamic malicious behaviors. Peekaboo carries out dynamic analysis for a subset of samples, and static analysis for all samples in a large corpus, leading to lenient hybrid analysis. Peekaboo novelty lies in reducing the computational overhead of dynamic analysis but also utilizes multiple features to improve the model performance, making it lightweight and suitable for real-world deployment for malware detection and classification at a large scale.

We have conducted multiple sets of experiments by training and evaluating Peekaboo on a large dataset, our results show a 99.67% binary classification (benign vs. malicious) accuracy and 96.30% multi-class classification (classifies samples into malware classes) accuracy with a FPR as low as 0.45%. In comparison with our baseline model, Peekaboo enables us to increase the accuracy for binary classification by more than 1% and 5% in the multi-class setting. In addition, we tested Peekaboo on unseen malware classes, and it improved the accuracy by almost 4% compared to our baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
VirutTotal: https://www.virustotal.com/.
2.
Radare2 version 3.9.0: https://www.radare.org/n/radare2.html.
3.
R2pipe version 4.0.0: https://github.com/radareorg/radare2-r2pipe.
4.
Softpedia: https://www.softpedia.com/.
5.
AVClass2 source code: https://github.com/malicialab/avclass.

References

David, O., Netanyahu, N.S.: DeepSign: deep learning for automatic malware signature generation and classification. In: International Joint Conference on Neural Networks (IJCNN), vol. 2015, pp. 1–8 (2015)
Google Scholar
Ye, Y., Chen, L., Hou, S., Hardy, W., Li, X.: DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl. Inf. Syst. 54(2), 265–285 (2017). https://doi.org/10.1007/s10115-017-1058-9
Article Google Scholar
Imran, M., Afzal, M.T., Qadir, M.A.: Using hidden Markov model for dynamic malware analysis: first impressions. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 816–821 (2015)
Google Scholar
Pranamulia, R., Asnar, Y.D., Perdana, R.S.: Profile hidden Markov model for malware classification: usage of system call sequence for malware classification. In: International Conference on Data and Software Engineering (ICoDSE), vol. 2017, pp. 1–5 (2017)
Google Scholar
Cordonsky, I., Rosenberg, I., Sicard, G., David, E.: DeepOrigin: end-to-end deep learning for detection of new malware families. In: International Joint Conference on Neural Networks (IJCNN), vol. 2018, pp. 1–7 (2018)
Google Scholar
Kim, J., Bu, S., Cho, S.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf. Sci. 460, 460–461 (2018)
Google Scholar
Kancherla, K., Mukkamala, S.: Image visualization based malware detection. In: 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 40–44 (2013)
Google Scholar
Zolotukhin, M., Hämäläinen, T.: Detection of zero-day malware based on the analysis of opcode sequences. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 386–391 (2014)
Google Scholar
Manavi, F., Hamzeh, A.: A new method for malware detection using opcode visualization. In: Artificial Intelligence and Signal Processing Conference (AISP), vol. 2017, pp. 96–102 (2017)
Google Scholar
Yewale, A., Singh, M.: Malware detection based on opcode frequency. In: International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), vol. 2016, pp. 646–649 (2016)
Google Scholar
Masabo, E., Kaawaase, K.S., Sansa-Otim, J., Ngubiri, J., Hanyurwimfura, D.: Improvement of malware classification using hybrid feature engineering. SN Comput. Sci. 1, 17:1–17:14 (2020)
Google Scholar
Zhang, Y., Rong, C., Huang, Q., Wu, Y., Yang, Z., Jiang, J.: Based on multi-features and clustering ensemble method for automatic malware categorization. In: IEEE Trustcom/BigDataSE/ICESS, vol. 2017, pp. 73–82 (2017)
Google Scholar
Zhang, J., Qin, Z., Yin, H.B., Ou, L., Zhang, K.: A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 84, 376–392 (2019)
Article Google Scholar
Duarte-Garcia, H.L., et al.: A semi-supervised learning methodology for malware categorization using weighted word embeddings. In: 2019 IEEE European Symposium on Security and Privacy Workshops, pp. 238–246 (2019)
Google Scholar
Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–1920 (2015)
Google Scholar
Athiwaratkun, B., Stokes, J.W.: Malware classification with LSTM and GRU language models and a character-level CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Elhadi, A.A., Maarof, M.A., Barry, B.I., Hentabli, H.: Enhancing the detection of metamorphic malware using call graphs. Comput. Secur. 46, 62–78 (2014)
Article Google Scholar
Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Networks 11, 659101 (2015)
Google Scholar
The cost of cybercrime. (2019). https://www.accenture.com/_acnmedia/PDF-96/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdf#zoom=50
Sebastián, S., Caballero, J.: AVclass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference (2020)
Google Scholar
Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. ArXiv, abs/1901.11196 (2019)
Google Scholar
Yuan, L., Wang, Y., Thompson, P., Narayan, V., Ye, J.: Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In: International Conference on Knowledge Discovery & Data Mining, pp. 1149–1157 (2012)
Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Rabadi, D., Teo, S.: Advanced windows methods on malware detection and classification. In: Annual Computer Security Applications Conference (2020)
Google Scholar
Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: dynamic malware analysis without feature engineering. In: Proceedings of the 35th Annual Computer Security Applications Conference (2019)
Google Scholar
Subedi, K.P., Budhathoki, D.R., Dasgupta, D.: Forensic analysis of ransomware families using static and dynamic analysis. In: IEEE Security and Privacy Workshops (SPW), vol. 2018, pp. 180–185 (2018)
Google Scholar
Aghakhani, H., et al.: When malware is packin’ heat. limits of machine learning classifiers based on static analysis features. In: NDSS (2020)
Google Scholar
Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., Shukla, S.: Malware classification using early stage behavioral analysis. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 16–23
Google Scholar
Kang, B., Kim, T., Kwon, H., Choi, Y., Im, E.: Malware classification method via binary content comparison. In: RACS (2012)
Google Scholar
Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K.: Machine learning aided static malware analysis: a survey and tutorial. ArXiv, abs/1808.01201 (2018)
Google Scholar
Egele, M., Scholte, T., Kirda, E., Krügel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44, 6:1–6:42 (2008)
Google Scholar
Or-Meir, O., Nissim, N., Elovici, Y., Rokach, L.: Dynamic malware analysis in the modern era—A state of the art survey. ACM Comput. Surv. (CSUR) 52, 1–48 (2019)
Article Google Scholar
Sihwail, R., Omar, K., Ariffin, K.A.: A survey on malware analysis techniques: static, dynamic, p. 8. hybrid and memory analysis, Int. J. Adv. Sci. Eng. Inf. Technol. 8(4-2), 1662–1671 (2018)
Google Scholar
Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 5, 56–64 (2014)
Google Scholar
Shijo, P.V., Salim, A.: Integrated static and dynamic analysis for malware detection. Procedia Comput. Sci. 46, 804–811 (2015)
Article Google Scholar
Islam, M., Tian, R., Batten, L., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Network Comput. Appl. 36(2), 646–656 (2013)
Article Google Scholar
Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Venkatraman, S.: Robust intelligent malware detection using deep learning. IEEE Access 7, 46717–46738 (2019)
Article Google Scholar
Venkatraman, S., Alazab, M., Vinayakumar, R.: A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 47, 377–389 (2019)
Google Scholar

Download references

Acknowledgment

This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Corporate Laboratory@University Scheme, National University of Singapore, and Singapore Telecommunications Ltd.

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Mingchang Liu, Vinay Sachidananda, Hongyi Peng, Rajendra Patil, Sivaanandh Muneeswaran & Mohan Gurusamy

Authors

Mingchang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Sachidananda
View author publications
You can also search for this author in PubMed Google Scholar
Hongyi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Rajendra Patil
View author publications
You can also search for this author in PubMed Google Scholar
Sivaanandh Muneeswaran
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Gurusamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingchang Liu .

Editor information

Editors and Affiliations

University of Malaga, Malaga, Spain
Cristina Alcaraz
University of Surrey, Guildford, UK
Liqun Chen
University of Kent, Canterbury, UK
Shujun Li
University of Milan, Milan, Italy
Pierangela Samarati

A Background

Static Analysis refers to the analysis of a binary file without executing it.

Dynamic Analysis refers to the analysis of a binary file by executing it in a controlled and well-monitored environment e.g., a virtual machine or sandbox.

Terminologies. In the malware analysis context, a feature often means a type of data extracted from the samples that can characterize the maliciousness. The use of this term in malware analysis is different from that in the usual machine learning setting where features represent the attribute of the observations. The term "multiple features" here refers to multiple kinds of features in the malware analysis setting. For example, Peekaboo uses API calls and opcodes as features.

Few-Shot Learning is a learning strategy that can improve the model generalization ability when the sample size is small. FSL is essential to Peekaboo when training the model on the API call dataset since we only select a small portion of the entire corpus of samples for dynamic analysis.

Multi-view Learning is a learning strategy that deals with data consisting of different views. A view can be a set of features obtained from one domain. In our setting, one view is the API calls collected during dynamic analysis and the other is the opcodes from static analysis. Multi-view learning aims to integrate the data for model training or use custom learning strategies to teach learners to consume data from different views to perform well on a common task. Partial multi-view learning is a task that specializes in handling missing views.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, M., Sachidananda, V., Peng, H., Patil, R., Muneeswaran, S., Gurusamy, M. (2022). Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-15777-6_29
Published: 24 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Background

A Background

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation