Abstract
Non-maximum suppression (NMS) is a post-processing step in most object detection pipelines. It is a greedy algorithm based on the Intersection over Union (IoU) of the bounding boxes to reduce false positives by removing excessive repeated bounding boxes, yet the geometric distributions of bounding boxes are not fully utilized. It is found that the distributions of bounding boxes’ center points correspond to the distributions of objects. Local areas with clustered distributions of center points exist objects. Local areas with sparse distributions of center points are considered as the false positives or noises of the detector. In this work, a density-based NMS (DB-NMS) is proposed, which is based on the density distributions of the bounding boxes’ center points to evaluate the importance of difference anchors. The proposed DB-NMS is able to obtain better results than the original NMS on the MS-COCO 2017 dataset. Because DB-NMS does not change the network structure, it can be easily integrated into the object detection pipelines to achieve better performances. Object detection pipelines such as Faster R-CNN and RetinaNet can be integrated with the proposed DB-NMS with little degradation on the computational efficiency.










Similar content being viewed by others
References
Bojarski M, Del Testa D, Dworakowski D et al K (2016) End to end learning for self-driving cars. arXiv preprint https://arxiv.org/abs/1604.07316
Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 2017 12th IEEE international conference on automatic face & gesture recognition, pp 650–657
Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogram Remote Sens 117:11–28
Panwar H, Gupta PK, Siddiqui MK et al (2020) AquaVision: automating the detection of waste in water bodies using deep transfer learning. Case Stud Chem Environ Eng 2:100026
Siddiqui MK, Islam MZ, Kabir MA (2019) A novel quick seizure detection and localization through brain data mining on ECoG dataset. Neural Comput Appl 31(9):5595–5608
Siddiqui MK, Huang X, Morales-Menendez R et al (2020) Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets. Int J Interact Design Manuf (IJIDeM) 14(4):1491–1509
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Dollár, Piotr et al (2009) Integral channel features. In: Proceedings of the British machine vision conference. BMVC Press, London, 91.1-91.11. ISBN 1-901725-39-1
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint https://arxiv.org/abs/1506.01497
Lin T Y, Goyal P, Girshick R et al P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Liu S, Huang D, Wang Y (2019) Adaptive nms: refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6459–6468
Lin T Y, Maire M, Belongie S et al C. L (2014, September) Microsoft coco: Common objects in context. In European conference on computer vision, pp 740–755
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint https://arxiv.org/abs/1804.02767
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Iandola, FN, Han S et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint https://arxiv.org/abs/1602.07360
Höppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New Yrok
Borgelt C (2006) Prototype-based classification and clustering. https://borgelt.net/habil/pbcc.pdf
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Birant D, Kut A (2007) ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data Knowl Eng 60(1):208–221
Bodla N, Singh B, Chellappa R, Davis L S (2017) Soft-NMS--improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
He Y, Zhang X, Savvides M, Kitani K (2018) Softer-nms: rethinking bounding box regression for accurate object detection. arXiv preprint https://arxiv.org/abs/1809.08545
Hosang J, Benenson R, Schiele B (2016) A convnet for non-maximum suppression. In: German conference on pattern recognition, pp 192–204
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Redmon J (2013) Darknet: Open source neural networks in c. http://pjreddie.com/darknet
Acknowledgements
This work was partially supported by the Fundamental Research Funds for the Central Universities (No. 2232021D-37), Natural Science Foundation of Shanghai (No. 21ZR1401700) and National Natural Science Foundation of China (No. 62176052).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rui, L., Tang, Xs. & Hao, K. DB-NMS: improving non-maximum suppression with density-based clustering. Neural Comput & Applic 34, 4747–4757 (2022). https://doi.org/10.1007/s00521-021-06628-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06628-w