Skip to main content

Advertisement

Log in

Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Adversarial training has been widely considered the most effective defense against adversarial attacks. However, recent studies have demonstrated that a large discrepancy exists in the class-wise robustness of adversarial training, leading to two potential issues: firstly, the overall robustness of a model is compromised due to the weakest class; and secondly, ethical concerns arising from unequal protection and biases, where certain societal demographic groups receive less robustness in defense mechanisms. Despite these issues, solutions to address the discrepancy remain largely underexplored. In this paper, we advance beyond existing methods that focus on class-level solutions. Our investigation reveals that hard examples, identified by higher cross-entropy values, can provide more fine-grained information about the discrepancy. Furthermore, we find that enhancing the diversity of hard examples can effectively reduce the robustness gap between classes. Motivated by these observations, we propose Fair Adversarial Training (FairAT) to mitigate the discrepancy of class-wise robustness. Extensive experiments on various benchmark datasets and adversarial attacks demonstrate that FairAT outperforms state-of-the-art methods in terms of both overall robustness and fairness. For a WRN-28-10 model trained on CIFAR10, FairAT improves the average and worst-class robustness by 2.13% and 4.50%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, Fergus R. Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations. 2014

    Google Scholar 

  2. Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

    Google Scholar 

  3. Carlini N, Wagner D. Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy. 2017, 39–57.

    Google Scholar 

  4. Croce F, Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 206.

    Google Scholar 

  5. Athalye A, Carlini N, Wagner D A. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 274–283.

    Google Scholar 

  6. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations. 2018

    Google Scholar 

  7. Zhang H, Yu Y, Jiao J, Xing E P, El Ghaoui L, Jordan M I. Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7472–7482.

    Google Scholar 

  8. Rice L, Wong E, Kolter J Z. Overfitting in adversarially robust deep learning. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 749.

    Google Scholar 

  9. Tian Q, Kuang K, Jiang K, Wu F, Wang Y. Analysis and applications of class-wise robustness in adversarial training. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 1561–1570.

    Chapter  Google Scholar 

  10. Xu H, Liu X, Li Y, Jain A K, Tang J. To be robust or to be fair: towards fairness in adversarial training. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11492–11501.

    Google Scholar 

  11. Hardt M, Price E, Srebro N. Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3323–3331.

    Google Scholar 

  12. Krasanakis E, Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris Y. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In: Proceedings of 2018 World Wide Web Conference. 2018, 853–862.

    Google Scholar 

  13. Ustun B, Liu Y, Parkes D C. Fairness without harm: decoupled classifiers with preference guarantees. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 6373–6382.

    Google Scholar 

  14. Ma X, Wang Z, Liu W. On the tradeoff between robustness and fairness. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 26230–26241.

    Google Scholar 

  15. Devries T, Taylor G W. Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552

    Google Scholar 

  16. Zhang H, Cissé M, Dauphin Y N, Lopez-Paz D. Mixup: beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018

    Google Scholar 

  17. Yun S, Han D, Chun S, Oh S J, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022–6031.

    Google Scholar 

  18. Wang Y, Zou D, Yi J, Bailey J, Ma X, Gu Q. Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of the 8th International Conference on Learning Representations. 2020

    Google Scholar 

  19. Zhan Y, Zheng B, Wang Q, Mou N, Guo B, Li Q, Shen C, Wang C. Towards black-box adversarial attacks on interpretable deep learning systems. In: Proceedings of 2022 IEEE International Conference on Multimedia and Expo. 2022, 1–6.

    Google Scholar 

  20. Mou N, Zheng B, Wang Q, Ge Y, Guo B. A few seconds can change everything: Fast decision-based attacks against DNNs. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence. 2022, 3342–3350.

    Google Scholar 

  21. Tramèr F, Carlini N, Brendel W, Madry A. On adaptive attacks to adversarial example defenses. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 138.

    Google Scholar 

  22. Aghaei S, Azizi M J, Vayanos P. Learning optimal and fair decision trees for non-discriminative decision-making. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 1418–1426.

    Google Scholar 

  23. Goel N, Yaghini M, Faltings B. Non-discriminatory machine learning through convex fairness criteria. In: Proceedings of 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018, 116.

    Chapter  Google Scholar 

  24. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Computing Surveys, 2022, 54(6): 115

    Article  Google Scholar 

  25. Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 7032–7042.

    Google Scholar 

  26. Cao K, Wei C, Gaidon A, Aréchiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1567–1578.

    Google Scholar 

  27. Agarwal A, Beygelzimer A, Dudik M, Langford J, Wallach H. A reductions approach to fair classification. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 60–69.

    Google Scholar 

  28. Cui Y, Jia M, Lin T Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269.

    Google Scholar 

  29. Zhan X, Liu H, Li Q, Chan A B. A comparative survey: benchmarking for pool-based active learning. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 4679–4686.

    Google Scholar 

  30. Beluch W H, Genewein T, Nürnberger A, Köhler J M. The power of ensembles for active learning in image classification. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 9368–9377.

    Chapter  Google Scholar 

  31. Gal Y, Islam R, Ghahramani Z. Deep Bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1183–1192.

    Google Scholar 

  32. Rade R, Moosavi-Dezfooli S M. Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. In: Proceedings of the 10th International Conference on Learning Representations. 2022

    Google Scholar 

  33. Zhang J, Zhu J, Niu G, Han B, Sugiyama M, Kankanhalli M S. Geometry-aware instance-reweighted adversarial training. In: Proceedings of the 9th International Conference on Learning Representations. 2021

    Google Scholar 

  34. Carmon Y, Raghunathan A, Schmidt L, Liang P, Duchi J C. Unlabeled data improves adversarial robustness. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1004.

    Google Scholar 

  35. Hendrycks D, Lee K, Mazeika M. Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2712–2721.

    Google Scholar 

  36. Najafi A, Maeda S I, Koyama M, Miyato T. Robustness to adversarial perturbations in learning from incomplete data. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 497.

    Google Scholar 

  37. Zhai R, Cai T, He D, Dan C, He K, Hopcroft J, Wang L. Adversarially robust generalization just requires more unlabeled data. In: Proceedings of ICLR 2020. 2020

    Google Scholar 

  38. Torralba A, Fergus R, Freeman W T. 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958–1970

    Article  Google Scholar 

  39. Cubuk E D, Zoph B, Mané D, Vasudevan V, Le Q V. AutoAugment: learning augmentation policies from data. 2018, arXiv preprint arXiv: 1805.09501

    Google Scholar 

  40. Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009

    Google Scholar 

  41. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 2011

    Google Scholar 

  42. He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630–645.

    Google Scholar 

  43. Zagoruyko S, Komodakis N. Wide residual networks. In: Proceedings of British Machine Vision Conference. 2016

    Google Scholar 

  44. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255.

    Chapter  Google Scholar 

  45. Croce F, Andriushchenko M, Sehwag V, Debenedetti E, Flammarion N, Chiang M, Mittal P, Hein M. RobustBench: a standardized adversarial robustness benchmark. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021

    Google Scholar 

  46. Wang D, Shang Y. A new active labeling method for deep learning. In: Proceedings of 2014 International Joint Conference on Neural Networks. 2014, 112–119.

    Google Scholar 

  47. Shannon C E. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1): 3–55

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant Nos. U20B2049, U21B2018 and 62302344).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qian Wang.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Ningping Mou received the BE degree from Wuhan University, China in 2021. He is currently working toward the PhD degree in the School of Cyber Science and Engineering, Wuhan University, China and a joint PhD degree with City University of Hong Kong, China. His research interests include machine learning and AI security.

Xinli Yue received the BE degree from Wuhan University, China in 2022. He is currently working toward the MS degree in the School of Cyber Science and Engineering, Wuhan University, China. His research interests include machine learning and AI security.

Lingchen Zhao is currently an associate professor with the School of Cyber Science and Engineering, Wuhan University, China. He received his PhD degree in Cyberspace Security from Wuhan University, China in 2021 and his BE degree in Information Security from Central South University, China in 2016. He was a postdoctoral researcher with City University of Hong Kong, China from 2021 to 2022. His research interests include data security and AI security.

Qian Wang (Fellow, IEEE) is currently a professor with the School of Cyber Science and Engineering, Wuhan University, China. He has published more than 200 papers with more than 120 publications in top-tier international conferences including USENIX NSDI, IEEE S&P, ACM CCS, USENIX Security, NDSS and with more than 20000 Google Scholar citations. He has long been engaged in the research of cyberspace security, with a focus on AI security, data outsourcing security and privacy, wireless systems security, and applied cryptography. He was a recipient of the 2018 IEEE TCSC Award for Excellence in Scalable Computing (early Career Researcher) and the 2016 IEEE ComSoc Asia-Pacific Outstanding Young Researcher Award. He serves as Associate Editors for IEEE TDSC, IEEE TIFS, and IEEE TETC.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mou, N., Yue, X., Zhao, L. et al. Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples. Front. Comput. Sci. 19, 193803 (2025). https://doi.org/10.1007/s11704-024-3587-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3587-1

Keywords