Abstract
The increase of cyber-attacks and new malware in the last decade led to the usage of various machine learning techniques in security products. While these techniques are designed to improve accuracy, some practical constraints (such as lowering the false positive rate) often influence the selected model.
This paper focuses on how various generative adversarial networks can be used to improve the average detection rate and reduce the false positives for a given neural network, by altering the training set. The result of this paper is a technique that can be used to reduce the number of false positives while preserving or in some cases increasing the detection rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Generative adversarial networks.
- 3.
- 4.
Toronto Face Dataset.
- 5.
Randomized rounding Fast Gradient Sign Method.
- 6.
- 7.
A unclear behaviour which is not recognized neither clean nor malware.
- 8.
Application program interface.
- 9.
\(Feat_{value}\) is the numerical value of a feature, \(Feat_{mean}\) represents the average value of that feature across all samples from the database, \(Feat_{stddev}\) is the standard deviation (for the values of that feature).
- 10.
False positive rate.
References
Al-Dujaili, A., Huang, A., Hemberg, E., O’Reilly, U.M.: Adversarial deep learning for robust detection of binary encoded malware (2018). https://doi.org/10.48550/ARXIV.1801.02950, https://arxiv.org/abs/1801.02950
Anderson, H.S., Kharkar, A., Filar, B., Evans, D., Roth, P.: Learning to evade static pe machine learning malware models via reinforcement learning (2018). 10.48550/ARXIV.1801.08917, https://arxiv.org/abs/1801.08917
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://doi.org/10.48550/ARXIV.1701.07875, https://arxiv.org/abs/1701.07875
Che, T., et al. Maximum-likelihood augmented discrete generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.07983, https://arxiv.org/abs/1702.07983
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014
Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012). https://doi.org/10.1109/SYNASC.2012.34
Hjelm, R.D., Jacob, A.P., Che, T., Trischler, A., Cho, K., Bengio, Y.: Boundary-seeking generative adversarial networks (2017). https://doi.org/10.48550/ARXIV.1702.08431, https://arxiv.org/abs/1702.08431
Kolosnjaji, B., et al.: Adversarial malware binaries: evading deep learning for malware detection in executables (2018). https://doi.org/10.48550/ARXIV.1803.04173, https://arxiv.org/abs/1803.04173
Kreuk, F., Barak, A., Aviv-Reuven, S., Baruch, M., Pinkas, B., Keshet, J.: Deceiving end-to-end deep learning malware detectors using adversarial examples (2018). https://doi.org/10.48550/ARXIV.1802.04528, https://arxiv.org/abs/1802.04528
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1606.07536, https://arxiv.org/abs/1606.07536
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015). https://doi.org/10.48550/ARXIV.1511.05644,https://arxiv.org/abs/1511.05644
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks (2016). https://doi.org/10.48550/ARXIV.1611.04076, https://arxiv.org/abs/1611.04076
Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). https://doi.org/10.48550/ARXIV.1411.1784, https://arxiv.org/abs/1411.1784
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Quiring, E., Pirch, L., Reimsbach, M., Arp, D., Rieck, K.: Against all odds: winning the defense challenge in an evasion competition with diversification (2020). https://doi.org/10.48550/ARXIV.2010.09569, https://arxiv.org/abs/2010.09569
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware detection by eating a whole exe (2017). https://doi.org/10.48550/ARXIV.1710.09435, https://arxiv.org/abs/1710.09435
Szegedy, C., et al.: Intriguing properties of neural networks (2013). https://doi.org/10.48550/ARXIV.1312.6199, https://arxiv.org/abs/1312.6199
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). https://doi.org/10.48550/ARXIV.1611.01224, https://arxiv.org/abs/1611.01224
Yilmaz, I., Masum, R.: Expansion of cyber attack data from unbalanced datasets using generative techniques (2019). https://doi.org/10.48550/ARXIV.1912.04549, https://arxiv.org/abs/1912.04549
Zhong, F., Cheng, X., Yu, D., Gong, B., Song, S., Yu, J.: Malfox: camouflaged adversarial malware example generation based on conv-GANs against black-box detectors (2020). https://doi.org/10.48550/ARXIV.2011.01509,https://arxiv.org/abs/2011.01509
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Simion, CA., Balan, G., Gavriluţ, D.T. (2022). Using GANs to Improve the Accuracy of Machine Learning Models for Malware Detection. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-21753-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)