Abstract
Current techniques in deep learning are still unable to train adversarially robust classifiers which perform as well as non-robust ones. In this work, we continue to study the space of loss functions, and show that the choice of loss can affect robustness in highly nonintuitive ways.
Specifically, we demonstrate that a surprising choice of loss function can in fact improve adversarial robustness against some attacks. Our loss function encourages accuracy on adversarial examples, and explicitly penalizes accuracy on natural examples. This is inspired by the theoretical and empirical works suggesting a fundamental tradeoff between standard accuracy and adversarial robustness. Our method, NAturally Penalized (NAP) loss, achieves 61.5% robust accuracy on CIFAR-10 with \(\varepsilon =8/255\) perturbations in \(\ell _\infty \) (against a PGD-60 adversary with 20 random restarts). This improves over the standard PGD defense by over 3%, against other loss functions proposed in the literature. Although TRADES performs better on CIFAR-10 against Auto-Attack, our approach gets better results on CIFAR-100. Our results thus suggest that significant robustness gains are possible by revisiting training techniques, even without additional data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Certain losses are theoretically justified for simpler models, for example as proper scoring rules or for margin maximizing reasons. But these justifications do not provably hold for overparameterized models such as modern deep networks.
- 2.
References
Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 274–283 (2018)
Balunovic, M., Vechev, M.: Adversarial training and provable defenses: bridging the gap. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SJxSDxrKDr
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)
Carlini, N., et al.: On evaluating adversarial robustness (2019)
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: Advances in Neural Information Processing Systems, pp. 11190–11201 (2019)
Chen, E.C., Lee, C.R.: LTD: low temperature distillation for robust adversarial training. arXiv preprint arXiv:2111.02331 (2021)
Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 1310–1320 (2019)
Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 113–123 (2019)
Dai, S., Mahloujifar, S., Mittal, P.: Parameterizing activation functions for adversarial robustness. arXiv preprint arXiv:2110.05626 (2021)
Engstrom, L., et al.: A discussion of ’adversarial examples are not bugs, they are features’. Distill (2019). https://doi.org/10.23915/distill.00019. https://distill.pub/2019/advex-bugs-discussion
Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, 7–10 September 2015, pp. 106.1–106.13 (2015)
Geirhos, R., Temme, C.R.M., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 7549–7561 (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Gowal, S., Rebuffi, S.A., Wiles, O., Stimberg, F., Calian, D.A., Mann, T.A.: Improving robustness using generated data. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Hayes, J.: Extensions and limitations of randomized smoothing for robustness guarantees. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 786–787 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016)
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Hendrycks, D., Lee, K., Mazeika, M.: Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 2712–2721 (2019)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, pp. 125–136 (2019)
Kannan, H., Kurakin, A., Goodfellow, I.J.: Adversarial logit pairing. CoRR abs/1803.06373 (2018). http://arxiv.org/abs/1803.06373
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lécuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, 19–23 May 2019, pp. 656–672 (2019)
Li, B., Chen, C., Wang, W., Carin, L.: Second-order adversarial attack and certifiable robustness. CoRR abs/1809.03113 (2018). http://arxiv.org/abs/1809.03113
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=B1g5sA4twr
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April—3 May 2018, Conference Track Proceedings (2018)
Rebuffi, S.A., Gowal, S., Calian, D.A., Stimberg, F., Wiles, O., Mann, T.: Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946 (2021)
Salman, H., Li, J., Razenshteyn, I.P., Zhang, P., Zhang, H., Bubeck, S., Yang, G.: Provably robust deep learning via adversarially trained smoothed classifiers. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 11289–11300 (2019)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 6105–6114 (2019)
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., Schmidt, L.: When robustness doesn’t promote robustness: synthetic vs. natural distribution shifts on imagenet (2020). https://openreview.net/forum?id=HyxPIyrFvH
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018)
Uesato, J., Alayrac, J., Huang, P., Stanforth, R., Fawzi, A., Kohli, P., et al.: Are labels required for improving adversarial robustness? In: Advances in Neural Information Processing Systems, pp. 12192–12202 (2019)
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial robustness requires revisiting misclassified examples. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rklOg6EFwS
Wong, E., Schmidt, F.R., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 8410–8419 (2018)
Xiao, C., Zhong, P., Zheng, C.: Enhancing adversarial defense by k-winners-take-all. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=Skgvy64tvr
Xiao, K.Y., Tjeng, V., Shafiullah, N.M.M., Madry, A.: Training for faster adversarial robustness verification via inducing relu stability. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Xie, C., Wu, Y., Maaten, L.V.D., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509 (2019)
Xie, C., Yuille, A.: Intriguing properties of adversarial training at scale. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HyxJhCEFDS
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016 (2016)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)
Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573 (2019)
Zhang, H., Cissé, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Proceedings (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Hyperparameter Search
Hyperparameter search for different modification of NAP as described in Table 5
B Softmax Distributions
Softmax Distribution of Natural vs. Adversarial Examples. To study the magnitude and influence of the “margin term” in the NAP loss, we plot the empirical distributions on the test set of \(p_y(x; \theta ), p_y(\hat{x}; \theta )\) and the “margin term” \(p_y(x; \theta ) - p_y(\hat{x}; \theta )\) at the end of training for the models in Table 4. The corresponding histograms are present in Fig. 6. Interestingly, the adversarial confidence \(p_y(\hat{x}; \theta )\) is bimodal (with modes around 0 and 1) for standard adversarial training but not for the NAP loss. This suggests the NAP loss is indeed encouraging the network to behave similarly on natural and adversarial inputs, though the exact mechanisms for why this yields robustness improvements is unclear, and requires further study.
The train data also has a similar but milder bimodal distribution for standard adversarial training (Figs. 4 and 5).
Histogram of softmax probability on correct labels, for the Train set at the end of standard adversarial training. Using Wide ResNet 34-10, same model as Table 4
Histogram of softmax probability on correct labels, for the Train set at the end of training with NAP loss. Using Wide ResNet 34-10, same model as Table 4
Standard Loss. Histogram of softmax probability on correct labels, for the Test set at the end of standard adversarial training. Using Wide ResNet 34-10, same model as Table 4. Plotting histograms of (left to right): \(p_y(x ; \theta )\), \(p_y(\hat{x};\theta )\), and \(p_y(x ; \theta ) - p_y(\hat{x};\theta )\).
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chandna, K. (2023). Improving Adversarial Robustness by Penalizing Natural Accuracy. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-25056-9_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25055-2
Online ISBN: 978-3-031-25056-9
eBook Packages: Computer ScienceComputer Science (R0)