Abstract
Non-Maximum Suppression (NMS) is an essential part of the object detection pipeline. However, due to the inconsistency between the classification confidence and the object localization, NMS may mistakenly eliminate the bounding boxes with low classification confidence and high localization accuracy. In this paper, we propose an attention-based non-maximum suppression (ANMS) algorithm. It reconstructs the attention map to obtain the object location information by backpropagating the top-level object classification semantic information. Furthermore, integrating the classification confidence and the attention map of the detection bounding boxes adjust the inconsistency between the classification confidence and the object localization. On the PASCAL VOC2007 and the PASCAL VOC2012 datasets, the proposed ANMS algorithm achieved 1.85 and 1.24 performance improvement over the NMS algorithm. On the MS COCO datasets, the proposed ANMS algorithm achieved 0.3 performance improvement, which proved the ANMS algorithm’s effectiveness.
Similar content being viewed by others
References
Bodla N, Singh B, Chellappa R et al (2017) Soft-NMS: Improving Object Detection With One Line of Code. Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Cao C, Liu X, Yang Y et al (2015) Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. Proceedings of the IEEE International Conference on Computer Vision, pp 2956–2964
Chen Y, Hong WC, Shen W, Huang N (2016) Electric load forecasting based on a least squares support vector machine with fuzzy time series and global harmony search algorithm, vol 9
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: International conference on computer vision & pattern recognition (CVPR’05). IEEE computer society, pp 886–893
Everingham M, Winn J (2011) The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Analysis, Statistical Modelling and Computational Learning, Tech. Rep
Everingham M, Van Gool L, Williams CKI et al (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results
Fan GF, Qing S, Wang H, Hong WC, Li HJ (2013) Support vector regression model based on empirical mode decomposition and auto regression for electric load forecasting, vol 6
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
He Y, Ma X, Luo X et al (2017) Vehicle traffic driven camera placement for better metropolis security surveillance. arXiv:1705.08508
He Y, Zhang X, Savvides M et al (2018) Softer-nms: rethinking bounding box regression for accurate object detection. arXiv:1809.08545
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4507–4515
Jiang B, Luo R, Mao J et al (2018) Acquisition of localization confidence for accurate object detection. Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799
Li MW, Wang YT, Geng J, Hong WC (2021) Chaos cloud quantum bat hybrid optimization algorithm. Nonlinear Dyn. 103(1):1167–1193
Liang X, Wang T, Yang L et al (2018) Cirl: Controllable imitative reinforcement learning for vision-based self-driving. Proceedings of the European conference on computer vision (ECCV), pp 584–599
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. European conference on computer vision. Springer, Cham, pp 740–755
Liu S, Huang D, Wang Y (2019) Adaptive NMS: refining pedestrian detection in a crowd CVPR
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. 18th International Conference on pattern recognition (ICPR’06). IEEE, pp 850–855
Ning C, Zhou H, Song Y, Tang J (2017) Inception single shot MultiBox detector for object detection. In: 2017 IEEE International conference on multimedia expo workshops (ICMEW), pp 549–554
Philbin J, Chum O, Isard M et al (2007) Object retrieval with large vocabularies and fast spatial matching. 2007 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 1–8
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, pp 91–99
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Confere-nce on Computer Vision, pp 618–626
Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster r-cnn. European Conference on Computer Vision. Springer, Cham, pp 330–348
Taigman Y, Yang M, Ranzato MA et al (2014) Deepface: closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Wang Q, Zhang J, Song S et al (2014) Attentional neural network: Feature selection using cognitive feedback. Advances in Neural Information Processing Systems, pp 2033–2041
Zhang J, Bargal SA, Lin Z et al (2018) Top-down neural attention by excitation backprop. Int J Comput Vis 126(10):1084–1102
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU loss: faster and better learning for bounding box regression AAAI
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, C., Cai, M., Ying, N. et al. ANMS: attention-based non-maximum suppression. Multimed Tools Appl 81, 11205–11219 (2022). https://doi.org/10.1007/s11042-022-12142-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12142-5