Stronger Targeted Poisoning Attacks Against Malware Detection

Narisada, Shintaro; Sasaki, Shoichiro; Hidano, Seira; Uchibayashi, Toshihiro; Suganuma, Takuo; Hiji, Masahiro; Kiyomoto, Shinsaku

doi:10.1007/978-3-030-65411-5_4

Shintaro Narisada¹¹,
Shoichiro Sasaki¹²,
Seira Hidano¹¹,
Toshihiro Uchibayashi¹³,
Takuo Suganuma¹⁴,
Masahiro Hiji¹⁵ &
…
Shinsaku Kiyomoto¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12579))

Included in the following conference series:

International Conference on Cryptology and Network Security

913 Accesses
2 Citations

Abstract

Attacks on machine learning systems such as malware detectors and recommendation systems are becoming a major threat. Data poisoning attacks are the primary method used; they inject a small amount of poisoning points into a training set of the machine learning model, aiming to degrade the overall accuracy of the model. Targeted data poisoning is a variant of data poisoning attacks that injects malicious data into the model to cause a misclassification of the targeted input data while keeping almost the same overall accuracy as the unpoisoned model. Sasaki et al. first applied targeted data poisoning to malware detection and proposed an algorithm to generate poisoning points to misclassify targeted malware as goodware. Their algorithm achieved \(85\%\) an attack success rate by adding \(15\%\) poisoning points for malware dataset with continuous variables while restricting the increase in the test error on nontargeted data to at most \(10\%\). In this paper, we consider common defensive methods called data sanitization defenses, against targeted data poisoning and propose a defense-aware attack algorithm. Moreover, we propose a stronger targeted poisoning algorithm based on the theoretical analysis of the optimal attack strategy proposed by Steinhardt et al. The computational cost of our algorithm is much less than that of existing targeted poisoning algorithms. As a result, our new algorithm achieves a \(91\%\) attack success rate for malware dataset with continuous variables by adding the same \(15\%\) poisoning points and is approximately \(10^3\) times faster in terms of the computational time needed to generate poison data than Sasaki’s algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our implementation is available on https://github.com/mlearning-security/stronger-targeted-poisoning.

References

Amos, B., Turner, H., White, J.: Applying machine learning classifiers to dynamic android malware detection at scale. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1666–1671 (2013)
Google Scholar
Anderson, H.S., Roth, P.: EMBER: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Anindya, I.C., Kantarcioglu, M.: Adversarial anomaly detection using centroid-based clustering. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 1–8. IEEE (2018)
Google Scholar
Baracaldo, N., Chen, B., Ludwig, H., Safavi, J.A.: Mitigating poisoning attacks on machine learning models: A data provenance based approach. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 103–110 (2017)
Google Scholar
Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010). https://doi.org/10.1007/s10994-010-5188-5
Article MathSciNet MATH Google Scholar
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389 (2012)
Bleha, S., Slivinsky, C., Hussien, B.: Computer-access security systems using keystroke dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 12(12), 1217–1222 (1990)
Article Google Scholar
Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. IEEE Trans. Dependable Secure Comput. 12(4), 400–412 (2014)
Article Google Scholar
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)
Cretu, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J., Keromytis, A.D.: Casting out demons: Sanitizing training data for anomaly sensors. In: 2008 IEEE Symposium on Security and Privacy (Sp 2008), pp. 81–95. IEEE (2008)
Google Scholar
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Article Google Scholar
Diamond, S., Boyd, S.: CVXPY: a python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)
MathSciNet MATH Google Scholar
Du, M., Jia, R., Song, D.: Robust anomaly detection and backdoor attack detection via differential privacy. arXiv preprint arXiv:1911.07116 (2019)
Firdausi, I., lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 201–203 (2010)
Google Scholar
Gavriluţ, D., Cimpoeşu, M., Anton, D., Ciortuz, L.: Malware detection using machine learning. In: 2009 International Multiconference on Computer Science and Information Technology, pp. 735–741 (2009)
Google Scholar
Ham, H.S., Choi, M.J.: Analysis of android malware detection performance using machine learning classifiers. In: 2013 international conference on ICT Convergence (ICTC), pp. 490–495. IEEE (2013)
Google Scholar
Hardy, W., Chen, L., Hou, S., Ye, Y., Li, X.: DL4MD: a deep learning framework for intelligent malware detection. In: Proceedings of the International Conference on Data Mining (DMIN), p. 61 (2016)
Google Scholar
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)
Article Google Scholar
Jung, W., Kim, S., Choi, S.: Poster: deep learning for zero-day flash malware detection. In: 36th IEEE Symposium on Security and Privacy, vol. 10, pp. 2809695–2817880 (2015)
Google Scholar
Kang, P., Hwang, S., Cho, S.: Continual retraining of keystroke dynamics based authenticator. In: Lee, S.-W., Li, S.Z. (eds.) ICB 2007. LNCS, vol. 4642, pp. 1203–1211. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74549-5_125
Chapter Google Scholar
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. pp. 1885–1894. (2017) JMLR. org
Google Scholar
Koh, P.W., Steinhardt, J., Liang, P.: Stronger data poisoning attacks break data sanitization defenses. arXiv preprint arXiv:1811.00741 (2018)
Kolosnjaji, B., et al.: Adversarial malware binaries: Evading deep learning for malware detection in executables. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2018)
Google Scholar
Konečnỳ, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016)
Kumar, B.J., Naveen, H., Kumar, B.P., Sharma, S.S., Villegas, J.: Logistic regression for polymorphic malware detection using anova f-test. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–5. IEEE (2017)
Google Scholar
Kwon, J., Lee, H.: Bingraph: Discovering mutant malware using hierarchical semantic signatures. In: 2012 7th International Conference on Malicious and Unwanted Software, pp. 104–111 (2012)
Google Scholar
Laskov, P., Schäfer, C., Kotenko, I., Müller, K.R.: Intrusion detection in unlabeled data with quarter-sphere support vector machines. PIK-praxis der Informationsverarbeitung und Kommunikation 27(4), 228–236 (2004)
Article Google Scholar
Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 64–71. IEEE (2005)
Google Scholar
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13
Chapter Google Scholar
Liu, Y., et al.: Trojaning attack on neural networks (2017)
Google Scholar
McLaughlin, N. et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, p. 301-308. Association for Computing Machinery, New York (2017)
Google Scholar
Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38 (2017)
Google Scholar
Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5
Chapter Google Scholar
Paudice, A., Muñoz-González, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C. (ed.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_1
Chapter Google Scholar
Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. J. Comput. Virol. 2(4), 243–256 (2007)
Article Google Scholar
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135 (2018)
Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: a static-dynamic approach for machine-learning-based malware detection. In: Herrero, À. et al. (eds.) International Joint Conference CISIS’ 12-ICEUTE’ 12-SOCO’ 12 Special Sessions. Advances in Intelligent Systems and Computing, vol. 189. Springer, Berlin (2013) https://doi.org/10.1007/978-3-642-33018-6_28
Sasaki, S., Hidano, S., Uchibayashi, T., Suganuma, T., Hiji, M., Kiyomoto, S.: On embedding backdoor in malware detectors using machine learning. In: 2019 17th International Conference on Privacy, Security and Trust (PST), pp. 1–5. IEEE (2019)
Google Scholar
Sgandurra, D., Muñoz-González, L., Mohsen, R., Lupu, E.C.: Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016)
Shafahi, A., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, pp. 6103–6113 (2018)
Google Scholar
Siddiqui, M., Wang, M.C., Lee, J.: Data mining methods for malware detection using instruction sequences. In: Artificial Intelligence and Applications, pp. 358–363 (2008)
Google Scholar
Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. In: Advances in Neural Information Processing Systems, pp. 3517–3529 (2017)
Google Scholar
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, pp. 8000–8010 (2018)
Google Scholar
Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)
Google Scholar
Xiao, H., Xiao, H., Eckert, C.: Adversarial label flips attack on support vector machines. In: ECAI, pp. 870–875 (2012)
Google Scholar
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. In: Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371–372 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

KDDI Research, Inc., Fujimino, Japan
Shintaro Narisada, Seira Hidano & Shinsaku Kiyomoto
Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Shoichiro Sasaki
Research Institute for Information Technology, Kyushu University, Fukuoka, Japan
Toshihiro Uchibayashi
Cyberscience Center, Tohoku University, Sendai, Japan
Takuo Suganuma
Graduate School of Economics and Management, Tohoku University, Sendai, Japan
Masahiro Hiji

Authors

Shintaro Narisada
View author publications
You can also search for this author in PubMed Google Scholar
Shoichiro Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Seira Hidano
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Uchibayashi
View author publications
You can also search for this author in PubMed Google Scholar
Takuo Suganuma
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Hiji
View author publications
You can also search for this author in PubMed Google Scholar
Shinsaku Kiyomoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shintaro Narisada .

Editor information

Editors and Affiliations

AIT Austrian Institute of Technology GmbH, Vienna, Austria
Stephan Krenn
Fraunhofer SIT, Darmstadt, Germany
Haya Shulman
IC LASEC, EPFL, Lausanne, Switzerland
Serge Vaudenay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narisada, S. et al. (2020). Stronger Targeted Poisoning Attacks Against Malware Detection. In: Krenn, S., Shulman, H., Vaudenay, S. (eds) Cryptology and Network Security. CANS 2020. Lecture Notes in Computer Science(), vol 12579. Springer, Cham. https://doi.org/10.1007/978-3-030-65411-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-65411-5_4
Published: 09 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65410-8
Online ISBN: 978-3-030-65411-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics