Skip to main content

Stronger Targeted Poisoning Attacks Against Malware Detection

  • Conference paper
  • First Online:
Cryptology and Network Security (CANS 2020)

Abstract

Attacks on machine learning systems such as malware detectors and recommendation systems are becoming a major threat. Data poisoning attacks are the primary method used; they inject a small amount of poisoning points into a training set of the machine learning model, aiming to degrade the overall accuracy of the model. Targeted data poisoning is a variant of data poisoning attacks that injects malicious data into the model to cause a misclassification of the targeted input data while keeping almost the same overall accuracy as the unpoisoned model. Sasaki et al. first applied targeted data poisoning to malware detection and proposed an algorithm to generate poisoning points to misclassify targeted malware as goodware. Their algorithm achieved \(85\%\) an attack success rate by adding \(15\%\) poisoning points for malware dataset with continuous variables while restricting the increase in the test error on nontargeted data to at most \(10\%\). In this paper, we consider common defensive methods called data sanitization defenses, against targeted data poisoning and propose a defense-aware attack algorithm. Moreover, we propose a stronger targeted poisoning algorithm based on the theoretical analysis of the optimal attack strategy proposed by Steinhardt et al. The computational cost of our algorithm is much less than that of existing targeted poisoning algorithms. As a result, our new algorithm achieves a \(91\%\) attack success rate for malware dataset with continuous variables by adding the same \(15\%\) poisoning points and is approximately \(10^3\) times faster in terms of the computational time needed to generate poison data than Sasaki’s algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our implementation is available on https://github.com/mlearning-security/stronger-targeted-poisoning.

References

  1. Amos, B., Turner, H., White, J.: Applying machine learning classifiers to dynamic android malware detection at scale. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1666–1671 (2013)

    Google Scholar 

  2. Anderson, H.S., Roth, P.: EMBER: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)

  3. Anindya, I.C., Kantarcioglu, M.: Adversarial anomaly detection using centroid-based clustering. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 1–8. IEEE (2018)

    Google Scholar 

  4. Baracaldo, N., Chen, B., Ludwig, H., Safavi, J.A.: Mitigating poisoning attacks on machine learning models: A data provenance based approach. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 103–110 (2017)

    Google Scholar 

  5. Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010). https://doi.org/10.1007/s10994-010-5188-5

    Article  MathSciNet  MATH  Google Scholar 

  6. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389 (2012)

  7. Bleha, S., Slivinsky, C., Hussien, B.: Computer-access security systems using keystroke dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 12(12), 1217–1222 (1990)

    Article  Google Scholar 

  8. Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. IEEE Trans. Dependable Secure Comput. 12(4), 400–412 (2014)

    Article  Google Scholar 

  9. Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)

  10. Cretu, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J., Keromytis, A.D.: Casting out demons: Sanitizing training data for anomaly sensors. In: 2008 IEEE Symposium on Security and Privacy (Sp 2008), pp. 81–95. IEEE (2008)

    Google Scholar 

  11. Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)

    Article  Google Scholar 

  12. Diamond, S., Boyd, S.: CVXPY: a python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Du, M., Jia, R., Song, D.: Robust anomaly detection and backdoor attack detection via differential privacy. arXiv preprint arXiv:1911.07116 (2019)

  14. Firdausi, I., lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 201–203 (2010)

    Google Scholar 

  15. Gavriluţ, D., Cimpoeşu, M., Anton, D., Ciortuz, L.: Malware detection using machine learning. In: 2009 International Multiconference on Computer Science and Information Technology, pp. 735–741 (2009)

    Google Scholar 

  16. Ham, H.S., Choi, M.J.: Analysis of android malware detection performance using machine learning classifiers. In: 2013 international conference on ICT Convergence (ICTC), pp. 490–495. IEEE (2013)

    Google Scholar 

  17. Hardy, W., Chen, L., Hou, S., Ye, Y., Li, X.: DL4MD: a deep learning framework for intelligent malware detection. In: Proceedings of the International Conference on Data Mining (DMIN), p. 61 (2016)

    Google Scholar 

  18. Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)

    Article  Google Scholar 

  19. Jung, W., Kim, S., Choi, S.: Poster: deep learning for zero-day flash malware detection. In: 36th IEEE Symposium on Security and Privacy, vol. 10, pp. 2809695–2817880 (2015)

    Google Scholar 

  20. Kang, P., Hwang, S., Cho, S.: Continual retraining of keystroke dynamics based authenticator. In: Lee, S.-W., Li, S.Z. (eds.) ICB 2007. LNCS, vol. 4642, pp. 1203–1211. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74549-5_125

    Chapter  Google Scholar 

  21. Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. pp. 1885–1894. (2017) JMLR. org

    Google Scholar 

  22. Koh, P.W., Steinhardt, J., Liang, P.: Stronger data poisoning attacks break data sanitization defenses. arXiv preprint arXiv:1811.00741 (2018)

  23. Kolosnjaji, B., et al.: Adversarial malware binaries: Evading deep learning for malware detection in executables. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2018)

    Google Scholar 

  24. Konečnỳ, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016)

  25. Kumar, B.J., Naveen, H., Kumar, B.P., Sharma, S.S., Villegas, J.: Logistic regression for polymorphic malware detection using anova f-test. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–5. IEEE (2017)

    Google Scholar 

  26. Kwon, J., Lee, H.: Bingraph: Discovering mutant malware using hierarchical semantic signatures. In: 2012 7th International Conference on Malicious and Unwanted Software, pp. 104–111 (2012)

    Google Scholar 

  27. Laskov, P., Schäfer, C., Kotenko, I., Müller, K.R.: Intrusion detection in unlabeled data with quarter-sphere support vector machines. PIK-praxis der Informationsverarbeitung und Kommunikation 27(4), 228–236 (2004)

    Article  Google Scholar 

  28. Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 64–71. IEEE (2005)

    Google Scholar 

  29. Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13

    Chapter  Google Scholar 

  30. Liu, Y., et al.: Trojaning attack on neural networks (2017)

    Google Scholar 

  31. McLaughlin, N. et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, p. 301-308. Association for Computing Machinery, New York (2017)

    Google Scholar 

  32. Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38 (2017)

    Google Scholar 

  33. Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5

    Chapter  Google Scholar 

  34. Paudice, A., Muñoz-González, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C. (ed.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_1

    Chapter  Google Scholar 

  35. Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. J. Comput. Virol. 2(4), 243–256 (2007)

    Article  Google Scholar 

  36. Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135 (2018)

  37. Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: a static-dynamic approach for machine-learning-based malware detection. In: Herrero, À. et al. (eds.) International Joint Conference CISIS’ 12-ICEUTE’ 12-SOCO’ 12 Special Sessions. Advances in Intelligent Systems and Computing, vol. 189. Springer, Berlin (2013) https://doi.org/10.1007/978-3-642-33018-6_28

  38. Sasaki, S., Hidano, S., Uchibayashi, T., Suganuma, T., Hiji, M., Kiyomoto, S.: On embedding backdoor in malware detectors using machine learning. In: 2019 17th International Conference on Privacy, Security and Trust (PST), pp. 1–5. IEEE (2019)

    Google Scholar 

  39. Sgandurra, D., Muñoz-González, L., Mohsen, R., Lupu, E.C.: Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016)

  40. Shafahi, A., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, pp. 6103–6113 (2018)

    Google Scholar 

  41. Siddiqui, M., Wang, M.C., Lee, J.: Data mining methods for malware detection using instruction sequences. In: Artificial Intelligence and Applications, pp. 358–363 (2008)

    Google Scholar 

  42. Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. In: Advances in Neural Information Processing Systems, pp. 3517–3529 (2017)

    Google Scholar 

  43. Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, pp. 8000–8010 (2018)

    Google Scholar 

  44. Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)

    Google Scholar 

  45. Xiao, H., Xiao, H., Eckert, C.: Adversarial label flips attack on support vector machines. In: ECAI, pp. 870–875 (2012)

    Google Scholar 

  46. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. In: Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371–372 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shintaro Narisada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Narisada, S. et al. (2020). Stronger Targeted Poisoning Attacks Against Malware Detection. In: Krenn, S., Shulman, H., Vaudenay, S. (eds) Cryptology and Network Security. CANS 2020. Lecture Notes in Computer Science(), vol 12579. Springer, Cham. https://doi.org/10.1007/978-3-030-65411-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65411-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65410-8

  • Online ISBN: 978-3-030-65411-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics