research-article

Balanced Gradient Penalty Improves Deep Long-Tailed Learning

Authors:

Hongying LiuAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 5093 - 5101

https://doi.org/10.1145/3503161.3547763

Published: 10 October 2022 Publication History

Abstract

In recent years, deep learning has achieved a great success in various image recognition tasks. However, the long-tailed setting over a semantic class plays a leading role in real-world applications. Common methods focus on optimization on balanced distribution or naive models. Few works explore long-tailed learning from a deep learning-based generalization perspective. The loss landscape on long-tailed learning is first investigated in this work. Empirical results show that sharpness-aware optimizers work not well on long-tailed learning. Because they do not take class priors into consideration, and they fail to improve performance of few-shot classes. To better guide the network and explicitly alleviate sharpness without extra computational burden, we develop a universal Balanced Gradient Penalty (BGP) method. Surprisingly, our BGP method does not need the detailed class priors and preserves privacy. Our new algorithm BGP, as a regularization loss, can achieve the state-of-the-art results on various image datasets (i.e., CIFAR-LT, ImageNet-LT and iNaturalist-2018) in the settings of different imbalance ratios.

References

[1]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. In Proceedings of the 33rd NeurIPS. 1567--1578.

[2]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. Springer, 213--229.

[3]

Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, and Riccardo Zecchina. 2019. Entropy-sgd: Biasing gradient descent into wide valleys. Journal of Statistical Mechanics: Theory and Experiment 2019, 12 (2019), 124018.

[4]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.

[5]

Jiequan Cui, Zhisheng Zhong, Shu Liu, Bei Yu, and Jiaya Jia. 2021. Parametric contrastive learning. In CVPR. 715--724.

[6]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conf. on computer vision and pattern recognition. 9268--9277.

[7]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conf. on computer vision and pattern recognition. Ieee, 248--255.

[8]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image isWorth 16x16Words: Transformers for Image Recognition at Scale. In Int'l Conf. on Learning Representations.

[9]

Chris Drumnond. 2003. Class imbalance and cost sensitivity: Why under sampling beats oversampling. In ICML-KDD 2003 Workshop: Learning from Imbalanced Datasets, Vol. 3.

[10]

Gintare Karolina Dziugaite and Daniel M Roy. 2017. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008 (2017).

[11]

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware Minimization for Efficiently Improving Generalization. In Int'l Conf. on Learning Representations.

[12]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE Int'l Conf. on computer vision. 1440--1448.

Digital Library

[13]

Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Int'l Conf. on intelligent computing. Springer, 878--887.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conf. on computer vision and pattern recognition. 770--778.

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Flat minima. Neural computation 9, 1 (1997), 1--42.

[16]

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2019. Decoupling Representation and Classifier for Long-Tailed Recognition. In Int'l Conf. on Learning Representations.

[17]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. (2016).

[18]

Jaehyung Kim, Jongheon Jeong, and Jinwoo Shin. 2020. M2m: Imbalanced classification via major-to-minor translation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 13896--13905.

[19]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.

[21]

Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. 2021. ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks. arXiv preprint arXiv:2102.11600 (2021).

[22]

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the loss landscape of neural nets. In Proceedings of the 32nd Int'l Conf. on Neural Information Processing Systems. 6391--6401.

[23]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE Int'l Conf. on computer vision. 2980--2988.

[24]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. 2019. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2537--2546.

[25]

David A McAllester. 1999. PAC-Bayesian model averaging. In Proceedings of the twelfth annual Conf. on Computational learning theory. 164--170.

Digital Library

[26]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273--1282.

[27]

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. 2020. Long-tail learning via logit adjustment. In Int'l Conf. on Learning Representations.

[28]

Golmant Noah, Yao Zhewei, Gholami Amir, Mahoney Michael, and Gonzalez Joseph. 2018. pytorch-hessian-eigenthings: efficient PyTorch Hessian eigen decomposition. https://github.com/noahgolmant/pytorch-hessian-eigenthings

[29]

Jill Nugent. 2018. iNaturalist. Science Scope 41, 7 (2018), 12--13.

[30]

Seulki Park, Jongin Lim, Younghan Jeon, and Jin Young Choi. 2021. Influence balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF Int'l Conf. on Computer Vision. 735--744.

[31]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037.

[32]

Sashank J Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konecny, Sanjiv Kumar, and Hugh Brendan McMahan. 2020. Adaptive Federated Optimization. In Int'l Conf. on Learning Representations.

[33]

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In Int'l Conf. on Machine Learning. PMLR, 4334--4343.

[34]

Dvir Samuel and Gal Chechik. 2021. Distributional Robustness Loss for Long-tail Learning. arXiv preprint arXiv:2104.03066 (2021).

[35]

Saurabh Sharma, Ning Yu, Mario Fritz, and Bernt Schiele. 2020. Long-tailed recognition using class-balanced experts. In DAGM German Conf. on Pattern Recognition. Springer, 86--100.

[36]

Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conf. on computer vision and pattern recognition. 11662--11671.

[37]

Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. 2020. Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect. Advances in Neural Information Processing Systems 33 (2020).

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[39]

Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, and Lei Wang. 2021. Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 943--952.

[40]

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella Yu. 2020. Long tailed Recognition by Routing Diverse Distribution-Aware Experts. In Int'l Conf. on Learning Representations.

[41]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Learning to model the tail. In Proceedings of the 31st Int'l Conf. on Neural Information Processing Systems. 7032--7042.

[42]

Liuyu Xiang, Guiguang Ding, and Jungong Han. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conf. on Computer Vision. Springer, 247--263.

Digital Library

[43]

Yuzhe Yang and Zhi Xu. 2020. Rethinking the Value of Labels for Improving Class-Imbalanced Learning. In NeurIPS.

[44]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

[45]

Zhewei Yao, Amir Gholami, Kurt Keutzer, and MichaelWMahoney. 2018. Hessian based analysis of large batch training and robustness to adversaries. In Proceedings of the 32nd Int'l Conf. on Neural Information Processing Systems. 4954--4964.

[46]

Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. 2021. Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision. arXiv preprint arXiv:2107.09249 (2021).

[47]

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2021. Deep Long-Tailed Learning: A Survey. arXiv preprint arXiv:2110.04596 (2021).

[48]

Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jin Xu, Changhu Wang, and Jihong Zhu. 2021. Improving Long-Tailed Classification from Instance Level. arXiv preprint arXiv:2104.06094 (2021).

[49]

Yaowei Zheng, Richong Zhang, and Yongyi Mao. 2021. Regularizing neural networks via adversarial model perturbation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 8156--8165.

[50]

Yaowei Zheng, Richong Zhang, and Yongyi Mao. 2021. Regularizing neural networks via adversarial model perturbation. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 8156--8165.

[51]

Zhong Zhisheng, Cui Jiequan, Liu nd Shu, and Jia Jiaya. 2021. Improving Calibration for Long-Tailed Recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).

[52]

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. Bbn: Bilateralbranch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 9719--9728.

Cited By

Mao ZJu WQin YLuo XZhang MEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph ClassificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612360(3817-3826)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612360
Zhou YQu YXu XShen H(2023)ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01042(11311-11321)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01042
Zhou ZLi LZhao PHeng PGong W(2023)Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00341(3499-3509)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00341

Index Terms

Balanced Gradient Penalty Improves Deep Long-Tailed Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Machine learning algorithms
      1. Regularization

Recommendations

Balanced knowledge distillation for long-tailed learning
Abstract
Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes. Existing methods usually modify the classification loss to increase the learning focus on tail classes, which unexpectedly sacrifice the ...
Long-Tailed Hashing with Wasserstein Quantization
Pattern Recognition
Abstract
Long-tailed hashing is to learn hash functions in unbalanced distribution datasets to represent images as binary hash codes for fast and accurate image retrieval. In contrast to balanced distribution datasets, unbalanced distributions are more ...
Feature fusion network for long-tailed visual recognition
Abstract
Deep learning has achieved remarkable success in recent years; however, deep learning methods face significant challenges on long-tailed datasets, which are prevalent in real-world scenarios. In a long-tailed dataset, there are many more samples ...
Highlights
- A novel two-stage feature fusion network is proposed.
- The learned features have good representation ability for both head and tail classes.
- The method has competitive recognition accuracy on both head and tail classes.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
312
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mao ZJu WQin YLuo XZhang MEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph ClassificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612360(3817-3826)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612360
Zhou YQu YXu XShen H(2023)ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01042(11311-11321)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01042
Zhou ZLi LZhao PHeng PGong W(2023)Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00341(3499-3509)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00341

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten