Improving Adversarial Robustness by Penalizing Natural Accuracy

Chandna, Kshitij

doi:10.1007/978-3-031-25056-9_33

Kshitij Chandna ORCID: orcid.org/0000-0002-0143-6370¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13801))

Included in the following conference series:

European Conference on Computer Vision

2128 Accesses

Abstract

Current techniques in deep learning are still unable to train adversarially robust classifiers which perform as well as non-robust ones. In this work, we continue to study the space of loss functions, and show that the choice of loss can affect robustness in highly nonintuitive ways.

Specifically, we demonstrate that a surprising choice of loss function can in fact improve adversarial robustness against some attacks. Our loss function encourages accuracy on adversarial examples, and explicitly penalizes accuracy on natural examples. This is inspired by the theoretical and empirical works suggesting a fundamental tradeoff between standard accuracy and adversarial robustness. Our method, NAturally Penalized (NAP) loss, achieves 61.5% robust accuracy on CIFAR-10 with $\varepsilon =8/255$ perturbations in $\ell _\infty $ (against a PGD-60 adversary with 20 random restarts). This improves over the standard PGD defense by over 3%, against other loss functions proposed in the literature. Although TRADES performs better on CIFAR-10 against Auto-Attack, our approach gets better results on CIFAR-100. Our results thus suggest that significant robustness gains are possible by revisiting training techniques, even without additional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploring misclassifications of robust neural networks to enhance adversarial attacks

Article Open access 21 March 2023

Mitigating Overfitting Using Regularization to Defend Networks Against Adversarial Examples

Adversarial Examples Are Closely Relevant to Neural Network Models - A Preliminary Experiment Explore

Notes

1.
Certain losses are theoretically justified for simpler models, for example as proper scoring rules or for margin maximizing reasons. But these justifications do not provably hold for overparameterized models such as modern deep networks.
2.
In Fig. 3 and Table 5 we penalize natural accuracy universally and find that even that works better than TRADES and MART.

References

Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 274–283 (2018)
Google Scholar
Balunovic, M., Vechev, M.: Adversarial training and provable defenses: bridging the gap. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SJxSDxrKDr
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)
Article MathSciNet MATH Google Scholar
Carlini, N., et al.: On evaluating adversarial robustness (2019)
Google Scholar
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: Advances in Neural Information Processing Systems, pp. 11190–11201 (2019)
Google Scholar
Chen, E.C., Lee, C.R.: LTD: low temperature distillation for robust adversarial training. arXiv preprint arXiv:2111.02331 (2021)
Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 1310–1320 (2019)
Google Scholar
Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 113–123 (2019)
Google Scholar
Dai, S., Mahloujifar, S., Mittal, P.: Parameterizing activation functions for adversarial robustness. arXiv preprint arXiv:2110.05626 (2021)
Engstrom, L., et al.: A discussion of ’adversarial examples are not bugs, they are features’. Distill (2019). https://doi.org/10.23915/distill.00019. https://distill.pub/2019/advex-bugs-discussion
Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, 7–10 September 2015, pp. 106.1–106.13 (2015)
Google Scholar
Geirhos, R., Temme, C.R.M., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 7549–7561 (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Gowal, S., Rebuffi, S.A., Wiles, O., Stimberg, F., Calian, D.A., Mann, T.A.: Improving robustness using generated data. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Hayes, J.: Extensions and limitations of randomized smoothing for robustness guarantees. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 786–787 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Google Scholar
Hendrycks, D., Lee, K., Mazeika, M.: Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 2712–2721 (2019)
Google Scholar
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, pp. 125–136 (2019)
Google Scholar
Kannan, H., Kurakin, A., Goodfellow, I.J.: Adversarial logit pairing. CoRR abs/1803.06373 (2018). http://arxiv.org/abs/1803.06373
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lécuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, 19–23 May 2019, pp. 656–672 (2019)
Google Scholar
Li, B., Chen, C., Wang, W., Carin, L.: Second-order adversarial attack and certifiable robustness. CoRR abs/1809.03113 (2018). http://arxiv.org/abs/1809.03113
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=B1g5sA4twr
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April—3 May 2018, Conference Track Proceedings (2018)
Google Scholar
Rebuffi, S.A., Gowal, S., Calian, D.A., Stimberg, F., Wiles, O., Mann, T.: Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946 (2021)
Salman, H., Li, J., Razenshteyn, I.P., Zhang, P., Zhang, H., Bubeck, S., Yang, G.: Provably robust deep learning via adversarially trained smoothed classifiers. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 11289–11300 (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 6105–6114 (2019)
Google Scholar
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., Schmidt, L.: When robustness doesn’t promote robustness: synthetic vs. natural distribution shifts on imagenet (2020). https://openreview.net/forum?id=HyxPIyrFvH
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018)
Uesato, J., Alayrac, J., Huang, P., Stanforth, R., Fawzi, A., Kohli, P., et al.: Are labels required for improving adversarial robustness? In: Advances in Neural Information Processing Systems, pp. 12192–12202 (2019)
Google Scholar
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial robustness requires revisiting misclassified examples. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rklOg6EFwS
Wong, E., Schmidt, F.R., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 8410–8419 (2018)
Google Scholar
Xiao, C., Zhong, P., Zheng, C.: Enhancing adversarial defense by k-winners-take-all. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=Skgvy64tvr
Xiao, K.Y., Tjeng, V., Shafiullah, N.M.M., Madry, A.: Training for faster adversarial robustness verification via inducing relu stability. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Google Scholar
Xie, C., Wu, Y., Maaten, L.V.D., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509 (2019)
Google Scholar
Xie, C., Yuille, A.: Intriguing properties of adversarial training at scale. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HyxJhCEFDS
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016 (2016)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)
Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573 (2019)
Zhang, H., Cissé, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Proceedings (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

New York University, New York, NY, 10012, USA
Kshitij Chandna

Authors

Kshitij Chandna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kshitij Chandna .

Editor information

Editors and Affiliations

IBM Research AI and MIT-IBM Watson AI Lab, Haifa, Israel
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Appendices

A Hyperparameter Search

B Softmax Distributions

Softmax Distribution of Natural vs. Adversarial Examples. To study the magnitude and influence of the “margin term” in the NAP loss, we plot the empirical distributions on the test set of $p_y(x; \theta ), p_y(\hat{x}; \theta )$ and the “margin term” $p_y(x; \theta ) - p_y(\hat{x}; \theta )$ at the end of training for the models in Table 4. The corresponding histograms are present in Fig. 6. Interestingly, the adversarial confidence $p_y(\hat{x}; \theta )$ is bimodal (with modes around 0 and 1) for standard adversarial training but not for the NAP loss. This suggests the NAP loss is indeed encouraging the network to behave similarly on natural and adversarial inputs, though the exact mechanisms for why this yields robustness improvements is unclear, and requires further study.

The train data also has a similar but milder bimodal distribution for standard adversarial training (Figs. 4 and 5).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chandna, K. (2023). Improving Adversarial Robustness by Penalizing Natural Accuracy. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-25056-9_33
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25055-2
Online ISBN: 978-3-031-25056-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Adversarial Robustness by Penalizing Natural Accuracy