Abstract
Because of the great loss and damage caused by malwares, malware detection has become a central issue of computer security. It has to be fast and very accurate. To develop suitable methods on needs very good quality benchmarks. One such benchmark is the Microsoft Kaggle malware challenge system run in 2015. Since then over 50 papers were published on this system. The best result were achieved with complex feature engineering. In this work we analyze the black-box neural method and what is novel analyze its results against the Microsoft Kaggle malware challenge benchmark. It is tempting to use convolution neural networks for malware analysis following the great success with analysis of images. Even the use of balanced classes and drop-out convergence does not beat XGBoost with feature engineering, although some room for improvement exists. The situation is similar to that for language analysis. The language is much more hierarchical than image, and apparently malware is too. The malware analysis still awaits optimal neural network architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gartner. https://www.gartner.com/newsroom/id/3836563. Accessed 6 June 2018
Ponemon Institute: COST OF CYBER CRIME STUDY 2017: https://www.accenture.com/t20171006T095146Z__w__/us-en/_acnmedia/PDF-62/Accenture-2017CostCybercrime-US-FINAL.pdf#zoom=50. Accessed 6 June 2018
PWC: Cyber-ruletka po polsku Dlaczego firmy w walce z cyberprzestępcami liczą na szczęście, Polish cyber-roulette Why companies count on luck in the fight against cybercriminals: https://www.pwc.pl/pl/pdf/publikacje/2018/cyber-ruletka-po-polsku-raport-pwc-gsiss-2018.pdf. Accessed 6 June 2018
Microsoft Kaggle challenge: https://www.kaggle.com/c/malware-classification. Accessed 6 June 2018
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft Malware Classification Challenge: https://arxiv.org/abs/1802.10135. Accessed 6 June 2018
Microsoft malware winner 1st place: http://blog.kaggle.com/2015/05/26/microsoft-malware-winners-interview-1st-place-no-to-overfitting; https://www.youtube.com/watch?time_continue=979&v=VLQTRlLGz5Y. Accessed 6 June 2018
Microsoft malware winners 2nd place: https://www.kaggle.com/c/malware-classification/discussion/13863. Accessed 6 June 2018
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the CODASPY 2016, pp. 183–194. ACM, New York (2016)
Bat-Erdene, M., Kim, T., Park, H., Lee, H.: Packer detection for multi-layer executables using entropy analysis. Entropy 19(3), 125 (2017)
Masud, M.M., Khan, L., Thuraisingham, B.M.: A scalable multi-level feature extraction technique to detect malicious executables. Inf. Syst. Front. 10(1), 33–45 (2008)
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th VizSec 2011. ACM, New York (2011). Article 4
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD, pp. 785–794 (2016)
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. CRC Press, Florida (2012)
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware Detection by Eating a Whole EXE. CoRR abs/1710.09435 (2017)
Krčál, M., Švec, O., Bálek, M., Jašek, O.: Deep convolutional malware classifiers can learn from raw executables and labels only. In: ICLR 2018 Workshop (2018)
http://imageimage-net.org/challenges/LSVRC/2017/results. Accessed 6 June 2018
Zhang, X., LeCun, Y.: Byte-Level Recursive Convolutional Auto-Encoder for Text. CoRR abs/1802.01817 (2018)
Schwenk, H., Barrault, L., Conneau, A., LeCun, Y.: Very deep convolutional networks for text classification. EACL 1, 1107–1116 (2017)
Yan, J., Qi, Y., Rao, Q.: Detecting malware with an ensemble method based on deep neural network. Security and Communication Networks 2018:1–7247095:16 (2018)
Yuxin, D., Siyi, Z.: Malware detection based on deep learning algorithm: Neural Comput & Appl. (2017). https://doi.org/10.1007/s00521-017-3077-6
Zdrojewska, A., Dutkiewicz, J., Jędrzejek, C., Olejnik, M.: Comparison of the novel classification methods on the reuters-21578 corpus. In: Choroś, K., et al. (eds.) Proceedings of MISSI 2018 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: The International Conference on Learning Representations (ICLR), San Diego (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Iwamoto, K., Wasaki, K.: Malware classification based on extracted API sequences using static analysis. In: Proceedings of the Asian Internet Engineeering Conference (AINTEC 2012). ACM, New York (2012)
Anderson, H.S., Roth, P.: EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. CoRRabs/1804.04637 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pieczyński, D., Jędrzejek, C. (2019). Malware Detection Using Black-Box Neural Method. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)