Abstract
This paper addresses foreground-foreground imbalance in object detection. Firstly, we introduce Mini-batch Stochastic Gradient Descent (MBGD) with YOLO and the foreground-foreground imbalance problem. Then T-distribution is devised and proved to smoothen the imbalanced distribution and allocate at least a representative for each class. Furthermore, Mini-Batch Imbalance Smoothing method (MB-IS) is proposed to address the foreground-foreground imbalance by following T-distribution and proportionally assigning class weights in a mini-batch. Finally, Extensive experiments on our own transaction dataset and VOC2007 dataset demonstrate the superiority of MB-IS with certain mini-batch size.








Similar content being viewed by others
References
Aydin I, Othman NA (2017) A new IoT combined face detection of people by using computer vision for security application. In: Proc IDAP’17, pp 1–6
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) DetNAS: Backbone search for object detection. In: Proc NIPS’19, pp 6642–6652
Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967
Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Proc NIPS’16, pp 379–387
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Du X, Lin TY, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) SpineNet: Learning scale-permuted backbone for recognition and localization. In: Proc CVPR’20, pp 11592–11601
Franchini G, Zanni L (2019) On the steplenght selection in stochastic gradient methods. In: Proc NUMTA’19, pp 186–197
Ghiasi G, Lin TY, Le QV (2019) NAS-FPN: Learning scalable feature pyramid architecture for object detection. In: Proc CVPR’19, pp 7036–7045
Girshick R (2015) Fast R-CNN. In: Proc ICCV’15, pp 1440–1448
Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd Olton, Birmingham, UK
Guo J, Han K, Wang Y, Zhang C, Yang Z, Wu H, Chen X, Xu C (2020) Hit-Detector: Hierarchical trinity architecture search for object detection. In: Proc CVPR’20, pp 11405–11414
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc CVPR’16, pp 770–778
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for MobileNetV3. In: Proc ICCV’19, pp 1314–1324
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc CVPR’17, pp 4700–4708
Idrees H, Shah M, Surette R (2017) Enhancing camera surveillance using computer vision: a research note. Polic: Int J 41:292–307
Khirirat S, Feyzmahdavian HR, Johansson M (2017) Mini-batch gradient descent: Faster convergence under data sparsity. In: Proc CDC’17
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proc ICCV’15, pp 1–23
Leng L, Zhang J, Xu J, Khan M K, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 17(5):2543–2554
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc CVPR’17, pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proc ICCV’17, pp 2980–2988
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection, arXiv preprint: arXiv:1911.09516
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. In: Proc ECCV’16, pp 21–37
Masko D, Hensman P (2015) The impact of imbalanced training data for convolutional neural networks
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. In: Proc HCS’08
Oksuz K, Cam BC, Akbas E, Kalkan S (2020) Generating positive bounding boxes for balanced training of object detectors. In: Proc WACV’20, pp 894–903
Ouyang W, Wang X, Zhang C, Yang X (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: Proc CVPR’16, pp 864–873
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: Towards balanced learning for object detection. In: Proc CVPR’19, pp 821–830
Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) MegDet: A large mini-batch object detector. In: Proc CVPR’18, pp 6181–6189
Redmon J (2013) Darknet: Open source neural networks in C
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proc CVPR’17, pp 7263–7271
Redmon J, Farhadi A (2018) YOLOv3:, An incremental improvement, arXiv preprint: arXiv:1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proc CVPR’16, pp 779–788
Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Röth G (2015) Tutorial 1: NVIDIA’s platform for deep neural networks. In: Proc DSAA’15
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint: arXiv:1409.1556
Taheri S, Hesamian G (2013) A generalization of the wilcoxon signed-rank test and its applications. Stat Pap 54:457–470
Tan M, Pang R, Le QV (2020) Efficient-Det: Scalable and efficient object detection. In: Proc CVPR’20, pp 10781–10790
Triguero I, González S, Moyano J M, García S, Herrera F (2017) Keel 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 10(1):1238–1249
Wang CY, Mark Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: A new backbone that can enhance learning capability of CNN. In: Proc CVPR’20, pp 390–391
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proc CVPR’17, pp 1492–1500
Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proc CVPR’18, pp 6848–6856
Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined r-CNN: A network for refining object details in instance segmentation. Sensors 20(4):1010
Zhong Z, Lei M, Cao D, Fan J, Li S (2017) Class-specific object proposals re-ranking for object detection in automatic driving. Neurocomputing 242:187–194
Acknowledgements
This research was partially supported by the National Natural Science Foundation of China under grant No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under Grant SYG201903, and Computer Basic Education Teaching Research Project under Grant 2018-AFCEC-328.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
This study was funded by Natural Science Foundation of China (grant number: 61876217, 62176175), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number: 17KJB520036), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (grant number: SZS201609), Suzhou Science and Technology Plan Project (grant number: SYG201903). The authors declare that they have no conflict of interest. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ai, X., Sheng, V.S. & Li, C. A MBGD enhancement method for imbalance smoothing. Multimed Tools Appl 81, 24225–24243 (2022). https://doi.org/10.1007/s11042-022-12697-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12697-3