Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection

Simion, Ciprian-Alin; Balan, Gheorghe; Gavriluţ, Dragoş Teodor

doi:10.1007/978-3-031-21753-1_39

Ciprian-Alin Simion^10,11,
Gheorghe Balan^10,11 &
Dragoş Teodor Gavriluţ^10,11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13756))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

821 Accesses

Abstract

The increase of cyber-attacks and new malware in the last decade led to the usage of various machine learning techniques in security products. While these techniques are designed to improve accuracy, some practical constraints (such as lowering the false positive rate) often influence the selected model.

This paper focuses on how various generative adversarial networks can be used to improve the average detection rate and reduce the false positives for a given neural network, by altering the training set. The result of this paper is a technique that can be used to reduce the number of false positives while preserving or in some cases increasing the detection rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.av-test.org/en/statistics/malware/.
2.
Generative adversarial networks.
3.
https://en.wikipedia.org/wiki/RGBA_color_model.
4.
Toronto Face Dataset.
5.
Randomized rounding Fast Gradient Sign Method.
6.
https://msrc-blog.microsoft.com/2020/06/01/machine-learning-security-evasion-competition-2020-invites-researchers-to-defend-and-attack/.
7.
A unclear behaviour which is not recognized neither clean nor malware.
8.
Application program interface.
9.
\(Feat_{value}\) is the numerical value of a feature, \(Feat_{mean}\) represents the average value of that feature across all samples from the database, \(Feat_{stddev}\) is the standard deviation (for the values of that feature).
10.
False positive rate.

References

Al-Dujaili, A., Huang, A., Hemberg, E., O’Reilly, U.M.: Adversarial deep learning for robust detection of binary encoded malware (2018). https://doi.org/10.48550/ARXIV.1801.02950, https://arxiv.org/abs/1801.02950
Anderson, H.S., Kharkar, A., Filar, B., Evans, D., Roth, P.: Learning to evade static pe machine learning malware models via reinforcement learning (2018). 10.48550/ARXIV.1801.08917, https://arxiv.org/abs/1801.08917
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://doi.org/10.48550/ARXIV.1701.07875, https://arxiv.org/abs/1701.07875
Che, T., et al. Maximum-likelihood augmented discrete generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.07983, https://arxiv.org/abs/1702.07983
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014
Google Scholar
Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012). https://doi.org/10.1109/SYNASC.2012.34
Hjelm, R.D., Jacob, A.P., Che, T., Trischler, A., Cho, K., Bengio, Y.: Boundary-seeking generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.08431, https://arxiv.org/abs/1702.08431
Kolosnjaji, B., et al.: Adversarial malware binaries: evading deep learning for malware detection in executables (2018). https://doi.org/10.48550/ARXIV.1803.04173, https://arxiv.org/abs/1803.04173
Kreuk, F., Barak, A., Aviv-Reuven, S., Baruch, M., Pinkas, B., Keshet, J.: Deceiving end-to-end deep learning malware detectors using adversarial examples (2018). https://doi.org/10.48550/ARXIV.1802.04528, https://arxiv.org/abs/1802.04528
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1606.07536, https://arxiv.org/abs/1606.07536
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015). https://doi.org/10.48550/ARXIV.1511.05644,https://arxiv.org/abs/1511.05644
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1611.04076, https://arxiv.org/abs/1611.04076
Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). https://doi.org/10.48550/ARXIV.1411.1784, https://arxiv.org/abs/1411.1784
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Quiring, E., Pirch, L., Reimsbach, M., Arp, D., Rieck, K.: Against all odds: winning the defense challenge in an evasion competition with diversification (2020). https://doi.org/10.48550/ARXIV.2010.09569, https://arxiv.org/abs/2010.09569
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware detection by eating a whole exe (2017). https://doi.org/10.48550/ARXIV.1710.09435, https://arxiv.org/abs/1710.09435
Szegedy, C., et al.: Intriguing properties of neural networks (2013). https://doi.org/10.48550/ARXIV.1312.6199, https://arxiv.org/abs/1312.6199
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). https://doi.org/10.48550/ARXIV.1611.01224, https://arxiv.org/abs/1611.01224
Yilmaz, I., Masum, R.: Expansion of cyber attack data from unbalanced datasets using generative techniques (2019). https://doi.org/10.48550/ARXIV.1912.04549, https://arxiv.org/abs/1912.04549
Zhong, F., Cheng, X., Yu, D., Gong, B., Song, S., Yu, J.: Malfox: camouflaged adversarial malware example generation based on conv-GANs against black-box detectors (2020). https://doi.org/10.48550/ARXIV.2011.01509,https://arxiv.org/abs/2011.01509

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, “Al.I. Cuza” University, Iaşi, Romania
Ciprian-Alin Simion, Gheorghe Balan & Dragoş Teodor Gavriluţ
Bitdefender Laboratory, Iaşi, Romania
Ciprian-Alin Simion, Gheorghe Balan & Dragoş Teodor Gavriluţ

Authors

Ciprian-Alin Simion
View author publications
You can also search for this author in PubMed Google Scholar
Gheorghe Balan
View author publications
You can also search for this author in PubMed Google Scholar
Dragoş Teodor Gavriluţ
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ciprian-Alin Simion .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simion, CA., Balan, G., Gavriluţ, D.T. (2022). Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-21753-1_39
Published: 21 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection