Skip to main content
Log in

A MBGD enhancement method for imbalance smoothing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper addresses foreground-foreground imbalance in object detection. Firstly, we introduce Mini-batch Stochastic Gradient Descent (MBGD) with YOLO and the foreground-foreground imbalance problem. Then T-distribution is devised and proved to smoothen the imbalanced distribution and allocate at least a representative for each class. Furthermore, Mini-Batch Imbalance Smoothing method (MB-IS) is proposed to address the foreground-foreground imbalance by following T-distribution and proportionally assigning class weights in a mini-batch. Finally, Extensive experiments on our own transaction dataset and VOC2007 dataset demonstrate the superiority of MB-IS with certain mini-batch size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aydin I, Othman NA (2017) A new IoT combined face detection of people by using computer vision for security application. In: Proc IDAP’17, pp 1–6

  2. Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934

  3. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  4. Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) DetNAS: Backbone search for object detection. In: Proc NIPS’19, pp 6642–6652

  5. Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967

    Article  Google Scholar 

  6. Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Proc NIPS’16, pp 379–387

  7. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  8. Du X, Lin TY, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) SpineNet: Learning scale-permuted backbone for recognition and localization. In: Proc CVPR’20, pp 11592–11601

  9. Franchini G, Zanni L (2019) On the steplenght selection in stochastic gradient methods. In: Proc NUMTA’19, pp 186–197

  10. Ghiasi G, Lin TY, Le QV (2019) NAS-FPN: Learning scalable feature pyramid architecture for object detection. In: Proc CVPR’19, pp 7036–7045

  11. Girshick R (2015) Fast R-CNN. In: Proc ICCV’15, pp 1440–1448

  12. Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd Olton, Birmingham, UK

    Google Scholar 

  13. Guo J, Han K, Wang Y, Zhang C, Yang Z, Wu H, Chen X, Xu C (2020) Hit-Detector: Hierarchical trinity architecture search for object detection. In: Proc CVPR’20, pp 11405–11414

  14. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Article  Google Scholar 

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc CVPR’16, pp 770–778

  16. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for MobileNetV3. In: Proc ICCV’19, pp 1314–1324

  17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc CVPR’17, pp 4700–4708

  18. Idrees H, Shah M, Surette R (2017) Enhancing camera surveillance using computer vision: a research note. Polic: Int J 41:292–307

    Article  Google Scholar 

  19. Khirirat S, Feyzmahdavian HR, Johansson M (2017) Mini-batch gradient descent: Faster convergence under data sparsity. In: Proc CDC’17

  20. Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proc ICCV’15, pp 1–23

  21. Leng L, Zhang J, Xu J, Khan M K, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 17(5):2543–2554

    Google Scholar 

  22. Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354

    Article  Google Scholar 

  23. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc CVPR’17, pp 2117–2125

  24. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proc ICCV’17, pp 2980–2988

  25. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection, arXiv preprint: arXiv:1911.09516

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. In: Proc ECCV’16, pp 21–37

  27. Masko D, Hensman P (2015) The impact of imbalanced training data for convolutional neural networks

  28. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. In: Proc HCS’08

  29. Oksuz K, Cam BC, Akbas E, Kalkan S (2020) Generating positive bounding boxes for balanced training of object detectors. In: Proc WACV’20, pp 894–903

  30. Ouyang W, Wang X, Zhang C, Yang X (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: Proc CVPR’16, pp 864–873

  31. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: Towards balanced learning for object detection. In: Proc CVPR’19, pp 821–830

  32. Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) MegDet: A large mini-batch object detector. In: Proc CVPR’18, pp 6181–6189

  33. Redmon J (2013) Darknet: Open source neural networks in C

  34. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proc CVPR’17, pp 7263–7271

  35. Redmon J, Farhadi A (2018) YOLOv3:, An incremental improvement, arXiv preprint: arXiv:1804.02767

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proc CVPR’16, pp 779–788

  37. Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  38. Röth G (2015) Tutorial 1: NVIDIA’s platform for deep neural networks. In: Proc DSAA’15

  39. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint: arXiv:1409.1556

  40. Taheri S, Hesamian G (2013) A generalization of the wilcoxon signed-rank test and its applications. Stat Pap 54:457–470

    Article  MathSciNet  Google Scholar 

  41. Tan M, Pang R, Le QV (2020) Efficient-Det: Scalable and efficient object detection. In: Proc CVPR’20, pp 10781–10790

  42. Triguero I, González S, Moyano J M, García S, Herrera F (2017) Keel 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 10(1):1238–1249

    Article  Google Scholar 

  43. Wang CY, Mark Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: A new backbone that can enhance learning capability of CNN. In: Proc CVPR’20, pp 390–391

  44. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proc CVPR’17, pp 1492–1500

  45. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proc CVPR’18, pp 6848–6856

  46. Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined r-CNN: A network for refining object details in instance segmentation. Sensors 20(4):1010

    Article  Google Scholar 

  47. Zhong Z, Lei M, Cao D, Fan J, Li S (2017) Class-specific object proposals re-ranking for object detection in automatic driving. Neurocomputing 242:187–194

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China under grant No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under Grant SYG201903, and Computer Basic Education Teaching Research Project under Grant 2018-AFCEC-328.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xusheng Ai.

Ethics declarations

Conflict of Interests

This study was funded by Natural Science Foundation of China (grant number: 61876217, 62176175), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number: 17KJB520036), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (grant number: SZS201609), Suzhou Science and Technology Plan Project (grant number: SYG201903). The authors declare that they have no conflict of interest. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ai, X., Sheng, V.S. & Li, C. A MBGD enhancement method for imbalance smoothing. Multimed Tools Appl 81, 24225–24243 (2022). https://doi.org/10.1007/s11042-022-12697-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12697-3

Keywords

Navigation