Ensemble Malware Classification Using Neural Networks

Wyrwinski, Piotr; Dutkiewicz, Jakub; Jedrzejek, Czeslaw

doi:10.1007/978-3-030-59000-0_10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1284))

Included in the following conference series:

International Conference on Multimedia Communications, Services and Security

503 Accesses
3 Citations

Abstract

This work presents an experimental study of malware classification using the Microsoft Malware Classification Challenge 2015 dataset. We combine the approach of the winning solution to the Microsoft Malware Classification Challenge with the neural network approach. Using a combination of n-grams features for both assembly (asm) and byte code enables us to significantly improve the result. By mixing multiple approaches, we are able to get the best log-loss result of 0.0025, so far. This comes mostly from the classical XGBoost method with n-gram contributions from the binary and assembly code. However, understanding this result is still incomplete. The standard neural network approaches (even with LSTM) alone give poorer results compared to the XGBoost, based on mostly n-gram. It is not clear why adding 6-grams to the binary code analysis does not improve results. There are many more options to be tested in the future, in particular networks.

Supported by PUT statutory funds. One of the authors (CJ) acknowledges the NVIDIA GPU Grant of Quadro P6000 card.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://paperswithcode.com/sota/question-answering-on-squad20.

References

Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the Sixth ACM on Conference on Data and Application Security and Privacy, CODASPY 2016, pp. 183–194 (2016). https://doi.org/10.1145/2857705.2857713
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). http://jmlr.org/papers/v3/bengio03a.html
Chelba, C., Norouzi, M., Bengio, S.: N-gram language modeling using recurrent neural network estimation. CoRR abs/1703.10724 (2017)
Google Scholar
Cianflone, A., Kosseim, L.: N-gram and neural language models for discriminating similar languages. CoRR abs/1708.03421 (2017)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2018). https://doi.org/10.1007/s11416-018-0323-0
Article Google Scholar
Le, Q., Boydell, O., Mac Namee, B., Scanlon, M.: Deep learning at the shallow end: malware classification for non-domain experts. Digit. Invest. 26, S118–S126 (2018)
Article Google Scholar
Li, M.Q., Fung, B.C.M., Charland, P., Ding, S.H.H.: I-MAD: a novel interpretable malware detector using hierarchical transformer. CoRR abs/1909.06865 (2019)
Google Scholar
Trofimov, M., Dmitry Ulyanov, S.S.: Kaggle ‘Microsoft malware classification challenge’ 3rd place solution. https://github.com/geffy/kaggle-malware
Narayanan, B.N., Davuluru, V.S.P.: Ensemble malware classification system using deep neural networks. Electronics 9, 721 (2020). https://doi.org/10.3390/electronics9050721
Pieczynski, D., Jedrzejek, C.: Malware detection using black-box neural method. In: Proceedings of MISSI - Multimedia and Network Information Systems 2018, pp. 180–189 (2018). https://doi.org/10.1007/978-3-319-98678-4_20
Raff, E., et al.: An investigation of byte n-gram features for malware classification. J. Comput. Virol. Hacking Tech. 14(1), 1–20 (2016). https://doi.org/10.1007/s11416-016-0283-1
Article MathSciNet Google Scholar
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge. CoRR abs/1802.10135 (2018)
Google Scholar
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Informat. 1(1), 1 (2012). https://doi.org/10.1186/2190-8532-1-1
Article Google Scholar
Simopoulos, C.M.A., Weretilnyk, E.A., Golding, G.B.: Prediction of plant lncRNA by ensemble machine learning classifiers. BMC Genom. 19(1), 316 (2018). https://doi.org/10.1186/s12864-018-4665-2
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Annual Conference on Neural Information Processing Systems 2017, pp. 5998–6008 (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need
Wang, X., Liu, J., Chen, Q.: Big 2015 Microsoft malware classification challenge, first place say no to overfitting. https://github.com/xiaozhouwang/kaggle_Microsoft_Malware
Yan, J., Qi, Y., Rao, Q.: Detecting malware with an ensemble method based on deep neural network. Sec. Commun. Netw. 2018 (2018). https://doi.org/10.1155/2018/7247095
Zak, R., Raff, E., Nicholas, C.: What can n-grams learn for malware detection? In: 12th International Conference on Malicious and Unwanted Software, MALWARE 2017, Fajardo, PR, USA, pp. 109–118 (2017). https://doi.org/10.1109/MALWARE.2017.8323963

Download references

Author information

Authors and Affiliations

Faculty of Computing, Poznan University of Technology, Poznań, Poland
Piotr Wyrwinski, Jakub Dutkiewicz & Czeslaw Jedrzejek

Authors

Piotr Wyrwinski
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Dutkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Czeslaw Jedrzejek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Czeslaw Jedrzejek .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Kraków, Poland
Andrzej Dziech
Royal Military Academy, Brussels, Belgium
Wim Mees
Gdańsk University of Technology, Gdańsk, Poland
Andrzej Czyżewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wyrwinski, P., Dutkiewicz, J., Jedrzejek, C. (2020). Ensemble Malware Classification Using Neural Networks. In: Dziech, A., Mees, W., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2020. Communications in Computer and Information Science, vol 1284. Springer, Cham. https://doi.org/10.1007/978-3-030-59000-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-59000-0_10
Published: 24 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58999-8
Online ISBN: 978-3-030-59000-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics