Skip to main content

Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2022 (IDEAL 2022)

Abstract

The increase of cyber-attacks and new malware in the last decade led to the usage of various machine learning techniques in security products. While these techniques are designed to improve accuracy, some practical constraints (such as lowering the false positive rate) often influence the selected model.

This paper focuses on how various generative adversarial networks can be used to improve the average detection rate and reduce the false positives for a given neural network, by altering the training set. The result of this paper is a technique that can be used to reduce the number of false positives while preserving or in some cases increasing the detection rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.av-test.org/en/statistics/malware/.

  2. 2.

    Generative adversarial networks.

  3. 3.

    https://en.wikipedia.org/wiki/RGBA_color_model.

  4. 4.

    Toronto Face Dataset.

  5. 5.

    Randomized rounding Fast Gradient Sign Method.

  6. 6.

    https://msrc-blog.microsoft.com/2020/06/01/machine-learning-security-evasion-competition-2020-invites-researchers-to-defend-and-attack/.

  7. 7.

    A unclear behaviour which is not recognized neither clean nor malware.

  8. 8.

    Application program interface.

  9. 9.

    \(Feat_{value}\) is the numerical value of a feature, \(Feat_{mean}\) represents the average value of that feature across all samples from the database, \(Feat_{stddev}\) is the standard deviation (for the values of that feature).

  10. 10.

    False positive rate.

References

  1. Al-Dujaili, A., Huang, A., Hemberg, E., O’Reilly, U.M.: Adversarial deep learning for robust detection of binary encoded malware (2018). https://doi.org/10.48550/ARXIV.1801.02950, https://arxiv.org/abs/1801.02950

  2. Anderson, H.S., Kharkar, A., Filar, B., Evans, D., Roth, P.: Learning to evade static pe machine learning malware models via reinforcement learning (2018). 10.48550/ARXIV.1801.08917, https://arxiv.org/abs/1801.08917

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://doi.org/10.48550/ARXIV.1701.07875, https://arxiv.org/abs/1701.07875

  4. Che, T., et al. Maximum-likelihood augmented discrete generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.07983, https://arxiv.org/abs/1702.07983

  5. Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13

  6. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014

    Google Scholar 

  7. Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012). https://doi.org/10.1109/SYNASC.2012.34

  8. Hjelm, R.D., Jacob, A.P., Che, T., Trischler, A., Cho, K., Bengio, Y.: Boundary-seeking generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.08431, https://arxiv.org/abs/1702.08431

  9. Kolosnjaji, B., et al.: Adversarial malware binaries: evading deep learning for malware detection in executables (2018). https://doi.org/10.48550/ARXIV.1803.04173, https://arxiv.org/abs/1803.04173

  10. Kreuk, F., Barak, A., Aviv-Reuven, S., Baruch, M., Pinkas, B., Keshet, J.: Deceiving end-to-end deep learning malware detectors using adversarial examples (2018). https://doi.org/10.48550/ARXIV.1802.04528, https://arxiv.org/abs/1802.04528

  11. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1606.07536, https://arxiv.org/abs/1606.07536

  12. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015). https://doi.org/10.48550/ARXIV.1511.05644,https://arxiv.org/abs/1511.05644

  13. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1611.04076, https://arxiv.org/abs/1611.04076

  14. Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). https://doi.org/10.48550/ARXIV.1411.1784, https://arxiv.org/abs/1411.1784

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Quiring, E., Pirch, L., Reimsbach, M., Arp, D., Rieck, K.: Against all odds: winning the defense challenge in an evasion competition with diversification (2020). https://doi.org/10.48550/ARXIV.2010.09569, https://arxiv.org/abs/2010.09569

  17. Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware detection by eating a whole exe (2017). https://doi.org/10.48550/ARXIV.1710.09435, https://arxiv.org/abs/1710.09435

  18. Szegedy, C., et al.: Intriguing properties of neural networks (2013). https://doi.org/10.48550/ARXIV.1312.6199, https://arxiv.org/abs/1312.6199

  19. Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). https://doi.org/10.48550/ARXIV.1611.01224, https://arxiv.org/abs/1611.01224

  20. Yilmaz, I., Masum, R.: Expansion of cyber attack data from unbalanced datasets using generative techniques (2019). https://doi.org/10.48550/ARXIV.1912.04549, https://arxiv.org/abs/1912.04549

  21. Zhong, F., Cheng, X., Yu, D., Gong, B., Song, S., Yu, J.: Malfox: camouflaged adversarial malware example generation based on conv-GANs against black-box detectors (2020). https://doi.org/10.48550/ARXIV.2011.01509,https://arxiv.org/abs/2011.01509

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ciprian-Alin Simion .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Simion, CA., Balan, G., Gavriluţ, D.T. (2022). Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21753-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21752-4

  • Online ISBN: 978-3-031-21753-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics