Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection

Hao, Tianran; Tao, Ying; Li, Meng; Ma, Xiao; Dong, Peng; Cui, Lisha; Lv, Pei; Xu, Mingliang

doi:10.1007/978-981-97-2092-7_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14593))

Included in the following conference series:

International Conference on Computational Visual Media

85 Accesses

Abstract

The current mainstream object detection methods usually tend to implement on datasets where the categories remain balanced, and have made great progress. However, in the presence of long-tail distribution, the performance is still unsatisfactory. Long-tail data distribution means that a few head classes occupy most of the data, while most of the tail classes are not representative, and tail classes are excessive negatively suppressed during training. Existing methods mainly consider suppression from negative samples of the tail classes to improve the detection performance of the tail classes, while ignoring suppression from correct background prediction. In this paper, we propose a new Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection (FBS-AEGL) to deal with the problem mentioned above. Firstly, we introduce the numerical factor among categories to weight different classes, then adaptively leverage the suppression of head classes according to the logit value of the network output. Meanwhile, dynamically adjusting the suppression gradient of the background classes to protect the head and common classes while improving the detection performance of the tail classes. We conduct comprehensive experiments on the challenging LVIS benchmark. FBS-AEGL Loss achieved the competitive results, with 29.8% segmentation AP and 29.4% box AP on LVIS v0.5 and 28.8% segmentation AP and 29.4% box AP on LVIS v1.0 based on ResNet-101.

This work was supported in part by the Zhengzhou Major Science and Technology Project under Grant 2021KJZX0060-6, in part by China Postdoctoral Science Foundation under Grant 2021TQ0301, and in part by the National Natural Science Foundation of China under Grant 62372415, 62036010, 62106232.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Alexandridis, K.P., Deng, J., Nguyen, A., Luo, S.: Long-tailed instance segmentation using Gumbel optimized loss. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. LNCS, vol. 13670, pp. 353–369. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_21
Cai, Z., Vasconcelos, N.: Cascade r-cnn: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)
Article Google Scholar
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Chen, K., et al.: MMdetection: open MMLAB detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111, 98–136 (2015)
Article Google Scholar
Feng, C., Zhong, Y., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3417–3426 (2021)
Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Gupta, A., Dollar, P., Girshick, R.: Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hsieh, T.I., Robb, E., Chen, H.T., Huang, J.B.: Droploss for long-tail instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1549–1557 (2021)
Google Scholar
Li, B.: Adaptive hierarchical representation learning for long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2313–2322 (2022)
Google Scholar
Li, B., et al.: Equalized focal loss for dense long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6990–6999 (2022)
Google Scholar
Li, Y., et al.: Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10991–11000 (2020)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Pan, T.Y., et al.: On model calibration for long-tailed object detection and instance segmentation. Adv. Neural. Inf. Process. Syst. 34, 2529–2542 (2021)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Reed, W.J.: The pareto, zipf and other power laws. Econ. Lett. 74(1), 15–19 (2001)
Article Google Scholar
Ren, J., Yu, C., Ma, X., Zhao, H., Yi, S., et al.: Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst. 33, 4175–4186 (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Shen, L., Lin, Z., Huang, Q.: Relay backpropagation for effective learning of deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision. ECCV 2016. LNCS, vol. 9911, pp. 467–482. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_29
Tan, J., Lu, X., Zhang, G., Yin, C., Li, Q.: Equalization loss v2: a new gradient balance approach for long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1685–1694 (2021)
Google Scholar
Tan, J., et al.: Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11662–11671 (2020)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013)
Article Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Google Scholar
Wang, J., et al.: Seesaw loss for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9695–9704 (2021)
Google Scholar
Wang, T., et al.: The devil is in cassification: a simple framework for long-tail instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision. ECCV 2020. LNCS, vol. 12359, pp. 728–744. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_43
Wu, J., Song, L., Wang, T., Zhang, Q., Yuan, J.: Forest r-cnn: large-vocabulary long-tailed object detection and instance segmentation. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1570–1578 (2020)
Google Scholar
Zhang, C., Pan, T.-Y., Chen, T., Zhong, J., Fu, W., Chao, W.-L.: Learning with free object segments for long-tailed instance segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision. ECCV 2022. LNCS, vol. 13670, pp. 655–672. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_38
Zhang, S., Chen, C., Peng, S.: Reconciling object-level and global-level objectives for long-tail detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 18982–18992 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Zhengzhou University, Zhengzhou, 450001, Henan, China
Tianran Hao, Ying Tao, Meng Li, Xiao Ma, Peng Dong, Lisha Cui, Pei Lv & Mingliang Xu

Authors

Tianran Hao
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tao
View author publications
You can also search for this author in PubMed Google Scholar
Meng Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Peng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Lisha Cui
View author publications
You can also search for this author in PubMed Google Scholar
Pei Lv
View author publications
You can also search for this author in PubMed Google Scholar
Mingliang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pei Lv .

Editor information

Editors and Affiliations

Victoria University of Wellington, Wellington, New Zealand
Fang-Lue Zhang
Ben-Gurion University, Be'er Sheva, Israel
Andrei Sharf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, T. et al. (2024). Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection. In: Zhang, FL., Sharf, A. (eds) Computational Visual Media. CVM 2024. Lecture Notes in Computer Science, vol 14593. Springer, Singapore. https://doi.org/10.1007/978-981-97-2092-7_10

Download citation

DOI: https://doi.org/10.1007/978-981-97-2092-7_10
Published: 30 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2091-0
Online ISBN: 978-981-97-2092-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection