Skip to main content

Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection

  • Conference paper
  • First Online:
Computational Visual Media (CVM 2024)

Abstract

The current mainstream object detection methods usually tend to implement on datasets where the categories remain balanced, and have made great progress. However, in the presence of long-tail distribution, the performance is still unsatisfactory. Long-tail data distribution means that a few head classes occupy most of the data, while most of the tail classes are not representative, and tail classes are excessive negatively suppressed during training. Existing methods mainly consider suppression from negative samples of the tail classes to improve the detection performance of the tail classes, while ignoring suppression from correct background prediction. In this paper, we propose a new Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection (FBS-AEGL) to deal with the problem mentioned above. Firstly, we introduce the numerical factor among categories to weight different classes, then adaptively leverage the suppression of head classes according to the logit value of the network output. Meanwhile, dynamically adjusting the suppression gradient of the background classes to protect the head and common classes while improving the detection performance of the tail classes. We conduct comprehensive experiments on the challenging LVIS benchmark. FBS-AEGL Loss achieved the competitive results, with 29.8% segmentation AP and 29.4% box AP on LVIS v0.5 and 28.8% segmentation AP and 29.4% box AP on LVIS v1.0 based on ResNet-101.

This work was supported in part by the Zhengzhou Major Science and Technology Project under Grant 2021KJZX0060-6, in part by China Postdoctoral Science Foundation under Grant 2021TQ0301, and in part by the National Natural Science Foundation of China under Grant 62372415, 62036010, 62106232.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Alexandridis, K.P., Deng, J., Nguyen, A., Luo, S.: Long-tailed instance segmentation using Gumbel optimized loss. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. LNCS, vol. 13670, pp. 353–369. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_21

  2. Cai, Z., Vasconcelos, N.: Cascade r-cnn: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)

    Article  Google Scholar 

  3. Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  4. Chen, K., et al.: MMdetection: open MMLAB detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  5. Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)

    Google Scholar 

  6. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111, 98–136 (2015)

    Article  Google Scholar 

  7. Feng, C., Zhong, Y., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3417–3426 (2021)

    Google Scholar 

  8. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  10. Gupta, A., Dollar, P., Girshick, R.: Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)

    Google Scholar 

  11. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  12. Hsieh, T.I., Robb, E., Chen, H.T., Huang, J.B.: Droploss for long-tail instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1549–1557 (2021)

    Google Scholar 

  13. Li, B.: Adaptive hierarchical representation learning for long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2313–2322 (2022)

    Google Scholar 

  14. Li, B., et al.: Equalized focal loss for dense long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6990–6999 (2022)

    Google Scholar 

  15. Li, Y., et al.: Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10991–11000 (2020)

    Google Scholar 

  16. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  17. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  18. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  19. Pan, T.Y., et al.: On model calibration for long-tailed object detection and instance segmentation. Adv. Neural. Inf. Process. Syst. 34, 2529–2542 (2021)

    Google Scholar 

  20. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  21. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  22. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  23. Reed, W.J.: The pareto, zipf and other power laws. Econ. Lett. 74(1), 15–19 (2001)

    Article  Google Scholar 

  24. Ren, J., Yu, C., Ma, X., Zhao, H., Yi, S., et al.: Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst. 33, 4175–4186 (2020)

    Google Scholar 

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  26. Shen, L., Lin, Z., Huang, Q.: Relay backpropagation for effective learning of deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision. ECCV 2016. LNCS, vol. 9911, pp. 467–482. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_29

  27. Tan, J., Lu, X., Zhang, G., Yin, C., Li, Q.: Equalization loss v2: a new gradient balance approach for long-tailed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1685–1694 (2021)

    Google Scholar 

  28. Tan, J., et al.: Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11662–11671 (2020)

    Google Scholar 

  29. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013)

    Article  Google Scholar 

  30. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

    Google Scholar 

  31. Wang, J., et al.: Seesaw loss for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9695–9704 (2021)

    Google Scholar 

  32. Wang, T., et al.: The devil is in cassification: a simple framework for long-tail instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision. ECCV 2020. LNCS, vol. 12359, pp. 728–744. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_43

  33. Wu, J., Song, L., Wang, T., Zhang, Q., Yuan, J.: Forest r-cnn: large-vocabulary long-tailed object detection and instance segmentation. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1570–1578 (2020)

    Google Scholar 

  34. Zhang, C., Pan, T.-Y., Chen, T., Zhong, J., Fu, W., Chao, W.-L.: Learning with free object segments for long-tailed instance segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision. ECCV 2022. LNCS, vol. 13670, pp. 655–672. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_38

  35. Zhang, S., Chen, C., Peng, S.: Reconciling object-level and global-level objectives for long-tail detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 18982–18992 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Lv .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, T. et al. (2024). Foreground and Background Separate Adaptive Equilibrium Gradients Loss for Long-Tail Object Detection. In: Zhang, FL., Sharf, A. (eds) Computational Visual Media. CVM 2024. Lecture Notes in Computer Science, vol 14593. Springer, Singapore. https://doi.org/10.1007/978-981-97-2092-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2092-7_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2091-0

  • Online ISBN: 978-981-97-2092-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics