skip to main content
10.1145/3503161.3547763acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Balanced Gradient Penalty Improves Deep Long-Tailed Learning

Published: 10 October 2022 Publication History

Abstract

In recent years, deep learning has achieved a great success in various image recognition tasks. However, the long-tailed setting over a semantic class plays a leading role in real-world applications. Common methods focus on optimization on balanced distribution or naive models. Few works explore long-tailed learning from a deep learning-based generalization perspective. The loss landscape on long-tailed learning is first investigated in this work. Empirical results show that sharpness-aware optimizers work not well on long-tailed learning. Because they do not take class priors into consideration, and they fail to improve performance of few-shot classes. To better guide the network and explicitly alleviate sharpness without extra computational burden, we develop a universal Balanced Gradient Penalty (BGP) method. Surprisingly, our BGP method does not need the detailed class priors and preserves privacy. Our new algorithm BGP, as a regularization loss, can achieve the state-of-the-art results on various image datasets (i.e., CIFAR-LT, ImageNet-LT and iNaturalist-2018) in the settings of different imbalance ratios.

References

[1]
Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. In Proceedings of the 33rd NeurIPS. 1567--1578.
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. Springer, 213--229.
[3]
Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, and Riccardo Zecchina. 2019. Entropy-sgd: Biasing gradient descent into wide valleys. Journal of Statistical Mechanics: Theory and Experiment 2019, 12 (2019), 124018.
[4]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[5]
Jiequan Cui, Zhisheng Zhong, Shu Liu, Bei Yu, and Jiaya Jia. 2021. Parametric contrastive learning. In CVPR. 715--724.
[6]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conf. on computer vision and pattern recognition. 9268--9277.
[7]
Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conf. on computer vision and pattern recognition. Ieee, 248--255.
[8]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image isWorth 16x16Words: Transformers for Image Recognition at Scale. In Int'l Conf. on Learning Representations.
[9]
Chris Drumnond. 2003. Class imbalance and cost sensitivity: Why under sampling beats oversampling. In ICML-KDD 2003 Workshop: Learning from Imbalanced Datasets, Vol. 3.
[10]
Gintare Karolina Dziugaite and Daniel M Roy. 2017. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008 (2017).
[11]
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware Minimization for Efficiently Improving Generalization. In Int'l Conf. on Learning Representations.
[12]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE Int'l Conf. on computer vision. 1440--1448.
[13]
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Int'l Conf. on intelligent computing. Springer, 878--887.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conf. on computer vision and pattern recognition. 770--778.
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Flat minima. Neural computation 9, 1 (1997), 1--42.
[16]
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2019. Decoupling Representation and Classifier for Long-Tailed Recognition. In Int'l Conf. on Learning Representations.
[17]
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. (2016).
[18]
Jaehyung Kim, Jongheon Jeong, and Jinwoo Shin. 2020. M2m: Imbalanced classification via major-to-minor translation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 13896--13905.
[19]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.
[21]
Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. 2021. ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks. arXiv preprint arXiv:2102.11600 (2021).
[22]
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the loss landscape of neural nets. In Proceedings of the 32nd Int'l Conf. on Neural Information Processing Systems. 6391--6401.
[23]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE Int'l Conf. on computer vision. 2980--2988.
[24]
Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. 2019. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2537--2546.
[25]
David A McAllester. 1999. PAC-Bayesian model averaging. In Proceedings of the twelfth annual Conf. on Computational learning theory. 164--170.
[26]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273--1282.
[27]
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. 2020. Long-tail learning via logit adjustment. In Int'l Conf. on Learning Representations.
[28]
Golmant Noah, Yao Zhewei, Gholami Amir, Mahoney Michael, and Gonzalez Joseph. 2018. pytorch-hessian-eigenthings: efficient PyTorch Hessian eigen decomposition. https://github.com/noahgolmant/pytorch-hessian-eigenthings
[29]
Jill Nugent. 2018. iNaturalist. Science Scope 41, 7 (2018), 12--13.
[30]
Seulki Park, Jongin Lim, Younghan Jeon, and Jin Young Choi. 2021. Influence balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF Int'l Conf. on Computer Vision. 735--744.
[31]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037.
[32]
Sashank J Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konecny, Sanjiv Kumar, and Hugh Brendan McMahan. 2020. Adaptive Federated Optimization. In Int'l Conf. on Learning Representations.
[33]
Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In Int'l Conf. on Machine Learning. PMLR, 4334--4343.
[34]
Dvir Samuel and Gal Chechik. 2021. Distributional Robustness Loss for Long-tail Learning. arXiv preprint arXiv:2104.03066 (2021).
[35]
Saurabh Sharma, Ning Yu, Mario Fritz, and Bernt Schiele. 2020. Long-tailed recognition using class-balanced experts. In DAGM German Conf. on Pattern Recognition. Springer, 86--100.
[36]
Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conf. on computer vision and pattern recognition. 11662--11671.
[37]
Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. 2020. Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect. Advances in Neural Information Processing Systems 33 (2020).
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[39]
Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, and Lei Wang. 2021. Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 943--952.
[40]
Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella Yu. 2020. Long tailed Recognition by Routing Diverse Distribution-Aware Experts. In Int'l Conf. on Learning Representations.
[41]
Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Learning to model the tail. In Proceedings of the 31st Int'l Conf. on Neural Information Processing Systems. 7032--7042.
[42]
Liuyu Xiang, Guiguang Ding, and Jungong Han. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conf. on Computer Vision. Springer, 247--263.
[43]
Yuzhe Yang and Zhi Xu. 2020. Rethinking the Value of Labels for Improving Class-Imbalanced Learning. In NeurIPS.
[44]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
[45]
Zhewei Yao, Amir Gholami, Kurt Keutzer, and MichaelWMahoney. 2018. Hessian based analysis of large batch training and robustness to adversaries. In Proceedings of the 32nd Int'l Conf. on Neural Information Processing Systems. 4954--4964.
[46]
Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. 2021. Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision. arXiv preprint arXiv:2107.09249 (2021).
[47]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2021. Deep Long-Tailed Learning: A Survey. arXiv preprint arXiv:2110.04596 (2021).
[48]
Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jin Xu, Changhu Wang, and Jihong Zhu. 2021. Improving Long-Tailed Classification from Instance Level. arXiv preprint arXiv:2104.06094 (2021).
[49]
Yaowei Zheng, Richong Zhang, and Yongyi Mao. 2021. Regularizing neural networks via adversarial model perturbation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 8156--8165.
[50]
Yaowei Zheng, Richong Zhang, and Yongyi Mao. 2021. Regularizing neural networks via adversarial model perturbation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 8156--8165.
[51]
Zhong Zhisheng, Cui Jiequan, Liu nd Shu, and Jia Jiaya. 2021. Improving Calibration for Long-Tailed Recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
[52]
Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. Bbn: Bilateralbranch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 9719--9728.

Cited By

View all
  • (2023)RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph ClassificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612360(3817-3826)Online publication date: 26-Oct-2023
  • (2023)ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01042(11311-11321)Online publication date: 1-Oct-2023
  • (2023)Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00341(3499-3509)Online publication date: Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. flat minima
  2. long-tailed learning
  3. regularization

Qualifiers

  • Research-article

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph ClassificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612360(3817-3826)Online publication date: 26-Oct-2023
  • (2023)ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01042(11311-11321)Online publication date: 1-Oct-2023
  • (2023)Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00341(3499-3509)Online publication date: Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media