Skip to main content

Improving Adversarial Robustness by Penalizing Natural Accuracy

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13801))

Included in the following conference series:

  • 2128 Accesses

Abstract

Current techniques in deep learning are still unable to train adversarially robust classifiers which perform as well as non-robust ones. In this work, we continue to study the space of loss functions, and show that the choice of loss can affect robustness in highly nonintuitive ways.

Specifically, we demonstrate that a surprising choice of loss function can in fact improve adversarial robustness against some attacks. Our loss function encourages accuracy on adversarial examples, and explicitly penalizes accuracy on natural examples. This is inspired by the theoretical and empirical works suggesting a fundamental tradeoff between standard accuracy and adversarial robustness. Our method, NAturally Penalized (NAP) loss, achieves 61.5% robust accuracy on CIFAR-10 with \(\varepsilon =8/255\) perturbations in \(\ell _\infty \) (against a PGD-60 adversary with 20 random restarts). This improves over the standard PGD defense by over 3%, against other loss functions proposed in the literature. Although TRADES performs better on CIFAR-10 against Auto-Attack, our approach gets better results on CIFAR-100. Our results thus suggest that significant robustness gains are possible by revisiting training techniques, even without additional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Certain losses are theoretically justified for simpler models, for example as proper scoring rules or for margin maximizing reasons. But these justifications do not provably hold for overparameterized models such as modern deep networks.

  2. 2.

    In Fig. 3 and Table 5 we penalize natural accuracy universally and find that even that works better than TRADES and MART.

References

  1. Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 274–283 (2018)

    Google Scholar 

  2. Balunovic, M., Vechev, M.: Adversarial training and provable defenses: bridging the gap. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SJxSDxrKDr

  3. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  4. Carlini, N., et al.: On evaluating adversarial robustness (2019)

    Google Scholar 

  5. Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: Advances in Neural Information Processing Systems, pp. 11190–11201 (2019)

    Google Scholar 

  6. Chen, E.C., Lee, C.R.: LTD: low temperature distillation for robust adversarial training. arXiv preprint arXiv:2111.02331 (2021)

  7. Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 1310–1320 (2019)

    Google Scholar 

  8. Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)

  9. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 113–123 (2019)

    Google Scholar 

  10. Dai, S., Mahloujifar, S., Mittal, P.: Parameterizing activation functions for adversarial robustness. arXiv preprint arXiv:2110.05626 (2021)

  11. Engstrom, L., et al.: A discussion of ’adversarial examples are not bugs, they are features’. Distill (2019). https://doi.org/10.23915/distill.00019. https://distill.pub/2019/advex-bugs-discussion

  12. Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, 7–10 September 2015, pp. 106.1–106.13 (2015)

    Google Scholar 

  13. Geirhos, R., Temme, C.R.M., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 7549–7561 (2018)

    Google Scholar 

  14. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)

    Google Scholar 

  15. Gowal, S., Rebuffi, S.A., Wiles, O., Stimberg, F., Calian, D.A., Mann, T.A.: Improving robustness using generated data. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  16. Hayes, J.: Extensions and limitations of randomized smoothing for robustness guarantees. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 786–787 (2020)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016)

    Google Scholar 

  18. Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)

    Google Scholar 

  19. Hendrycks, D., Lee, K., Mazeika, M.: Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 2712–2721 (2019)

    Google Scholar 

  20. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, pp. 125–136 (2019)

    Google Scholar 

  21. Kannan, H., Kurakin, A., Goodfellow, I.J.: Adversarial logit pairing. CoRR abs/1803.06373 (2018). http://arxiv.org/abs/1803.06373

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  24. Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  25. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  26. Lécuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, 19–23 May 2019, pp. 656–672 (2019)

    Google Scholar 

  27. Li, B., Chen, C., Wang, W., Carin, L.: Second-order adversarial attack and certifiable robustness. CoRR abs/1809.03113 (2018). http://arxiv.org/abs/1809.03113

  28. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

  29. Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)

  30. Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=B1g5sA4twr

  31. Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April—3 May 2018, Conference Track Proceedings (2018)

    Google Scholar 

  32. Rebuffi, S.A., Gowal, S., Calian, D.A., Stimberg, F., Wiles, O., Mann, T.: Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946 (2021)

  33. Salman, H., Li, J., Razenshteyn, I.P., Zhang, P., Zhang, H., Bubeck, S., Yang, G.: Provably robust deep learning via adversarially trained smoothed classifiers. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 11289–11300 (2019)

    Google Scholar 

  34. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  35. Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 6105–6114 (2019)

    Google Scholar 

  36. Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., Schmidt, L.: When robustness doesn’t promote robustness: synthetic vs. natural distribution shifts on imagenet (2020). https://openreview.net/forum?id=HyxPIyrFvH

  37. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018)

  38. Uesato, J., Alayrac, J., Huang, P., Stanforth, R., Fawzi, A., Kohli, P., et al.: Are labels required for improving adversarial robustness? In: Advances in Neural Information Processing Systems, pp. 12192–12202 (2019)

    Google Scholar 

  39. Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial robustness requires revisiting misclassified examples. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rklOg6EFwS

  40. Wong, E., Schmidt, F.R., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 8410–8419 (2018)

    Google Scholar 

  41. Xiao, C., Zhong, P., Zheng, C.: Enhancing adversarial defense by k-winners-take-all. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=Skgvy64tvr

  42. Xiao, K.Y., Tjeng, V., Shafiullah, N.M.M., Madry, A.: Training for faster adversarial robustness verification via inducing relu stability. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)

    Google Scholar 

  43. Xie, C., Wu, Y., Maaten, L.V.D., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509 (2019)

    Google Scholar 

  44. Xie, C., Yuille, A.: Intriguing properties of adversarial training at scale. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HyxJhCEFDS

  45. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)

    Google Scholar 

  46. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016 (2016)

    Google Scholar 

  47. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

  48. Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573 (2019)

  49. Zhang, H., Cissé, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Proceedings (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kshitij Chandna .

Editor information

Editors and Affiliations

Appendices

A Hyperparameter Search

Fig. 1.
figure 1

Hyperparameter search for different loss functions on WideResNet34x4

Fig. 2.
figure 2

Hyperparameter search for NAP on WideResNet34x10

Fig. 3.
figure 3

Hyperparameter search for different modification of NAP as described in Table 5

B Softmax Distributions

Softmax Distribution of Natural vs. Adversarial Examples. To study the magnitude and influence of the “margin term” in the NAP loss, we plot the empirical distributions on the test set of \(p_y(x; \theta ), p_y(\hat{x}; \theta )\) and the “margin term” \(p_y(x; \theta ) - p_y(\hat{x}; \theta )\) at the end of training for the models in Table 4. The corresponding histograms are present in Fig. 6. Interestingly, the adversarial confidence \(p_y(\hat{x}; \theta )\) is bimodal (with modes around 0 and 1) for standard adversarial training but not for the NAP loss. This suggests the NAP loss is indeed encouraging the network to behave similarly on natural and adversarial inputs, though the exact mechanisms for why this yields robustness improvements is unclear, and requires further study.

The train data also has a similar but milder bimodal distribution for standard adversarial training (Figs. 4 and 5).

Fig. 4.
figure 4

Histogram of softmax probability on correct labels, for the Train set at the end of standard adversarial training. Using Wide ResNet 34-10, same model as Table 4

Fig. 5.
figure 5

Histogram of softmax probability on correct labels, for the Train set at the end of training with NAP loss. Using Wide ResNet 34-10, same model as Table 4

Fig. 6.
figure 6

Standard Loss. Histogram of softmax probability on correct labels, for the Test set at the end of standard adversarial training. Using Wide ResNet 34-10, same model as Table 4. Plotting histograms of (left to right): \(p_y(x ; \theta )\), \(p_y(\hat{x};\theta )\), and \(p_y(x ; \theta ) - p_y(\hat{x};\theta )\).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chandna, K. (2023). Improving Adversarial Robustness by Penalizing Natural Accuracy. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25056-9_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25055-2

  • Online ISBN: 978-3-031-25056-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics