ABSTRACT
Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives. This problem is more severe in pedestrian detection because the instance density varies more intensively. However, previous works on NMS don't consider or vaguely consider the factor of the existent of nearby pedestrians. Thus, we propose \heatmapname (\heatmapnameshort ), which pinpoints the objects nearby each proposal with a Gaussian distribution, together with \nmsname, which dynamically eases the suppression for the space that might contain other objects with a high likelihood. Compared to Greedy-NMS, our method, as the state-of-the-art, improves by $3.9%$ AP, $5.1%$ Recall, and $0.8%$ MR\textsuperscript-2 on CrowdHuman to $89.0%$ AP and $92.9%$ Recall, and $43.9%$ MR\textsuperscript-2 respectively.
Supplemental Material
- Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S Davis. 2017. Soft-NMS--improving object detection with one line of code. In Proceedings of the IEEE international conference on computer vision. 5561--5569.Google ScholarCross Ref
- Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154--6162.Google ScholarCross Ref
- Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2019. Relational Learning for Joint Head and Human Detection. arXiv preprint arXiv:1909.10674 (2019).Google Scholar
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition . 3213--3223.Google ScholarCross Ref
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387.Google ScholarDigital Library
- Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.Google ScholarCross Ref
- Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International journal of computer vision , Vol. 111, 1 (2015), 98--136.Google Scholar
- Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C Berg. 2017. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).Google Scholar
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.Google ScholarDigital Library
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.Google ScholarDigital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, and Xiangyu Zhang. 2019. Bounding box regression with uncertainty for accurate object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2888--2897.Google ScholarCross Ref
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.Google ScholarCross Ref
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
- Songtao Liu, Di Huang, and Yunhong Wang. 2019. Adaptive nms: Refining pedestrian detection in a crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6459--6468.Google ScholarCross Ref
- Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018b. Path Aggregation Network for Instance Segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Google ScholarCross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.Google ScholarCross Ref
- Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. 2018a. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In Proceedings of the European Conference on Computer Vision (ECCV). 618--634.Google ScholarDigital Library
- Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. 2019. Mask-guided attention network for occluded pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision. 4967--4975.Google ScholarCross Ref
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision , Vol. 115, 3 (2015), 211--252.Google Scholar
- Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. 2018. Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018).Google Scholar
- Tao Song, Leiyu Sun, Di Xie, Haiming Sun, and Shiliang Pu. 2018. Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In Proceedings of the European Conference on Computer Vision (ECCV). 536--551.Google ScholarCross Ref
- Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. 2019. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2965--2974.Google ScholarCross Ref
- Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. 2018. Repulsion loss: Detecting pedestrians in a crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7774--7783.Google ScholarCross Ref
- Tong Yang, Xiangyu Zhang, Zeming Li, Wenqiang Zhang, and Jian Sun. 2018. Metaanchor: Learning to detect objects with customized anchors. In Advances in Neural Information Processing Systems. 320--330.Google Scholar
- Kevin Zhang, Feng Xiong, Peize Sun, Li Hu, Boxun Li, and Gang Yu. 2019. Double Anchor R-CNN for Human Detection in a Crowd. arXiv preprint arXiv:1909.09998 (2019).Google Scholar
- Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213--3221.Google ScholarCross Ref
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV). 637--653.Google ScholarCross Ref
- Chunluan Zhou and Junsong Yuan. 2018. Bi-box regression for pedestrian detection and occlusion estimation. In ECCV . 135--151.Google Scholar
- Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308--9316.Google ScholarCross Ref
Index Terms
- NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination
Recommendations
NMS-Loss: Learning with Non-Maximum Suppression for Crowded Pedestrian Detection
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalNon-Maximum Suppression (NMS) is essential for object detection and affects the evaluation results by incorporating False Positives (FP) and False Negatives (FN), especially in crowd occlusion scenes. In this paper, we raise the problem of weak ...
Self-Mimic Learning for Small-scale Pedestrian Detection
MM '20: Proceedings of the 28th ACM International Conference on MultimediaDetecting small-scale pedestrians is one of the most challenging problems in pedestrian detection. Due to the lack of visual details, the representations of small-scale pedestrians tend to be weak to be distinguished from background clutters. In this ...
Real-time pedestrian detection via hierarchical convolutional feature
With the development of pedestrian detection technologies, existing methods can not simultaneously satisfy high quality detection and fast calculation for practical applications. Therefore, the goal of our research is to balance of pedestrian detection ...
Comments