Elsevier

Neurocomputing

Volume 432, 7 April 2021, Pages 101-110
Neurocomputing

MAPD: An improved multi-attribute pedestrian detection in a crowd

https://doi.org/10.1016/j.neucom.2020.12.005Get rights and content

Abstract

Recently, CNN (convolutional neural networks) based pedestrian detection has made significant progress, but pedestrian detection in a crowd is still a challenge. Pedestrians are close to each other in a crowd, which is difficult for discriminating individuals. To address this issue, we propose an improved Multi-attribute pedestrian detection (MAPD) method, which coptimize person intra-class compactness and inter-class discrepancy. The contributions are fourfold: (1) We analyze the effect of positive setting on the detector and adopt a better positive settings strategy to mitigate extreme class imbalance problems. (2) Inspired by Person Reid, we employ the triplet loss function to learn the advanced id feature of pedestrians. (3) We propose a novel Piecewise NMS algorithm to reduce false positive of small objects. (4) We propose a novel multi-attribute NMS algorithm based on Piecewise NMS algorithm and id information, which can adaptively distinguish predicted boxes of different pedestrians and improve the detector performance. Finally, we evaluate the MAPD detector on two benchmark datasets, including CityPersons and CrowdHuman. Results show that our approach outperforms state-of-the-art methods with a big margin.

Introduction

Pedestrian detection, a branch of computer vision, plays an essential role in the field, such as autonomous driving and robotics. Pedestrian detection has developed tremendously, which evolves from designing hand-crafted features [1], [2], [3] to extracting features by deep learning [4], [5], [6], [7].

In recent years, society’s rapid development of society drives up the need for more advanced pedestrian detection in crowded scenarios. The existing pedestrian detectors perform well in not occlusion but perform poorly in the crowd. The result can be shown in the current research [8]: the reasonable index has reached an excellent score of 8.8% on the Cityperson dataset [9], while it is only 46.6% of the heavy index. The result is worse in a crowd, such as the reasonable index is only 35.76% on the CrowdHuman dataset [10]. Our goal is to improve the performance of pedestrian detectors in a crowd.

As one of the subsets of object detection, pedestrian detection has inherited advantages from many successful techniques. The object detection is grouped into the anchor-base detector and anchor-free detector. The former complete network training and testing by setting a group of anchors, such as Faster-RCNN [11], SSD [12], YoloV2/V3 [13], [14], requires to obtain anchor parameters from the training datasets. In contrast, the later complete network training and testing without setting up anchors, such as cornernet [15], fcos [16], and centernet [17], which have a flexible network structure and fewer processing operations for training datasets. In this paper, we adopt the anchor-free method to build network structure.

For pedestrian detection methods in crowded scenarios, Zhang et al. recently proposed the CSID (Center, Scale, and Id Prediction) [8] detection method, which achieves optimal results in both the Cityperson and CrowdHuman datasets. It introduces a class balance scheme based on CSP (Center and Scale Prediction) [18] to tackle extreme class imbalances during the training process. Besides, advanced density and id information are obtained by increasing an attribute map, and the predicted boxes are redefined by using density and id information. However, the CSID method directly sets the 2 × 2 region in the center of the object as positive to reduce the impact of extreme class imbalance. The more positive settings should be explored, such as 1 × 1, 2 × 2, 3 × 3, and 4 × 4 regions. The class balance strategy yields more false positive of small targets, and it is difficult for the two attributes of the pedestrian to converge on one map. We propose the MAPD pedestrian detector to solve the above problems.

In summary, the contributions of this paper are as follows:

  • (1) We set different regions in the center of the object as positive, and obtain the optimal positive setting scheme (3 × 3 region is the best one) through experiments, which can effectively solve extreme imbalances.

  • (2) We design two maps to obtain id and density information respectively, which solves the interference problem caused by one map predicting two attributes. Triplet loss [18] is applied to optimize the presentation of the id map, so that highly discriminative features are learned for different pedestrians.

  • (3) We put forward a novel Piecewise NMS algorithm to reduce false positives for small targets by considering density information and class balance strategy.

  • (4) We propose a Multi-attribute NMS algorithm by combining Piecewise NMS and ID information, which uses more pedestrian attribute information to redefine the detection results.

Experiments have proved that the MAPD detector scores 8.5% in terms of the reasonable index on the CityPerson dataset and 27.75% in terms of the reasonable index on the CrowdHuman dataset, and the MAPD detector outperforms the state-of-the-art pedestrian detectors. Fig. 1 presents an example of the MAPD detector result on the CrowdHuman dataset, which shows excellent performance of our method in a crowd.

Section snippets

Related work

In this section, we firstly introduce the related works on pedestrian detection. Then we review the critical technology of Person Reid [19]. Furthermore, we summarize the current work related to solving the detection in a crowd.

MAPD pedestrian detector

In this section, we first introduce the overall structure of the MAPD detector, including feature extraction and detector head. Then, five branches of the detector head are introduced, which include the design of ground truth targets and loss functions. Finally, a multi-attribute NMS algorithm is proposed to improve the performance of the detector.

Experiments

To prove the effectiveness of the proposed MAPD detector, we conduct experiments on the CityPersons dataset [9] and CrowdHuman dataset [10].

Conclusions

In this paper, we propose a MAPD detector for crowd detection, which obtains center, scale, offset, density and id attributes by five output maps. We first optimize person intra-class compactness and inter-class discrepancy and propose a multi-attribute NMS algorithm to distinguish highly overlapping crowds. Besides, we design a class balance strategy to alleviate training difficulties caused by extreme class imbalances and propose a Piecewise NMS algorithm to reduce false positive for small

CRediT authorship contribution statement

Yang Wang: Conceptualization, Methodology, Software, Writing - original draft, Formal analysis. Chong Han: Methodology, Writing - original draft, Formal analysis. Guangle Yao: Formal analysis, Writing - review & editing. Wanlin Zhou: Formal analysis, Funding acquisition, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Yang Wang, received his master’s degree from Nanjing University of Aeronautics and Astronautics in 2017. Currently studying for a Ph.D at Nanjing University of Aeronautics and Astronautics. His research interest lies in the areas of computer vision.

References (40)

  • P. Dollár, Z. Tu, P. Perona, S. Belongie, Integral channel...
  • W. Nam, P. Dollár, J.H. Han, Local decorrelation for improved detection, arXiv preprint...
  • S. Zhang, R. Benenson, B. Schiele, et al., Filtered channel features for pedestrian detection., in: CVPR, Vol. 1, 2015,...
  • X. Wang et al.

    Repulsion loss: detecting pedestrians in a crowd

  • S. Zhang et al.

    How far are we from solving pedestrian detection?

  • S. Zhang et al.

    Occlusion-aware r-cnn: detecting pedestrians in a crowd

  • S. Zhang et al.

    Occluded pedestrian detection through guided attention in cnns

  • J. Zhang, L. Lin, Y.-C. Chen, Y. Hu, S.C. Hoi, J. Zhu, Csid: Center, scale, identity and density-aware pedestrian...
  • S. Zhang et al.

    Citypersons: a diverse dataset for pedestrian detection

  • S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, J. Sun, Crowdhuman: a benchmark for detecting human in a crowd,...
  • S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in:...
  • W. Liu et al.

    Ssd: Single shot multibox detector

  • J. Redmon et al.

    Yolo9000: better, faster, stronger

  • J. Redmon, A. Farhadi, Yolov3: an incremental improvement, arXiv preprint...
  • H. Law et al.

    Detecting objects as paired keypoints

  • Z. Tian et al.

    Fcos: fully convolutional one-stage object detection

  • K. Duan et al.

    Centernet: keypoint triplets for object detection

  • W. Liu et al.

    High-level semantic feature detection: a new perspective for pedestrian detection

  • C. Su et al.

    Deep attributes driven multi-camera person re-identification

  • P. Dollár et al.

    Fast feature pyramids for object detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • Cited by (17)

    • Realizing balanced object detection through prior location scale information and repulsive loss

      2022, Neurocomputing
      Citation Excerpt :

      They can achieve high scores in general object detection benchmarks like COCO [7]. Many real-world applications are based on object detection, such as pedestrian detection [8], defect detection [9], and face detection [10]. The imbalance problem exists in many aspects of object detection methods, exerting negative effects on achieving satisfactory performance.

    • Region NMS-based deep network for gigapixel level pedestrian detection with two-step cropping

      2022, Neurocomputing
      Citation Excerpt :

      modified the scores of proposals used in the NMS process with learned localization confidences to guide NMS to preserve more accurately localized bounding boxes. Yang [48] used density information and class balancing to reduce false detection and achieve NMS of small objects. Liu [29] proposed a new adaptive NMS method that applies dynamic suppression strategies to better refine the bounding boxes in crowded scenes.

    • Cross-task feature alignment for seeing pedestrians in the dark

      2021, Neurocomputing
      Citation Excerpt :

      Recently, convolutional neural networks (CNNs) have achieved significant progress in various computer vision tasks [10–15] including pedestrian detection due to powerful feature implicit representation capacity. Therefore, many excellent CNNs based pedestrian detection methods [16–22] yield satisfactory performance under normal-light environments. For example, Liu et al. [18] addressed this problem from a perspective of treating pedestrian detection as a semantic feature abstracting and predicting the scale of the central points.

    • CCPrune: Collaborative channel pruning for learning compact convolutional networks

      2021, Neurocomputing
      Citation Excerpt :

      In recent years, the deep convolutional neural networks (CNNs) have achieved great success in computer vision such as image classification [27,17,47,49], object detection [7,43,14,1,50], and semantic segmentation [36,3], and so on.

    View all citing articles on Scopus

    Yang Wang, received his master’s degree from Nanjing University of Aeronautics and Astronautics in 2017. Currently studying for a Ph.D at Nanjing University of Aeronautics and Astronautics. His research interest lies in the areas of computer vision.

    Chong Han, studying for a master’s degree at Nanjing University of Aeronautics and Astronautics. His research interests lie in the areas of computer vision.

    Guangle Yao, received his Ph.d degree from University of Electronic Science and Technology of China in 2019. Now he is an associate professor Chengdu University of Technology. His research interests lie in the areas of computer vision.

    Wanlin Zhou, born in January 1964, has a Ph.d degree. He is now a professor and doctoral supervisor of the Department of aerospace manufacturing engineering, School of mechanical and electrical engineering, Nanjing University of Aeronautics and Astronautics. From 2001 to 2005, he worked in the State Key Laboratory of Aeronautical intelligent material and structure, Nanjing University of Aeronautics and Astronautics, and obtained the doctor’s degree of engineering. In recent years, he is mainly engaged in the research of intelligent materials and structures, artificial intelligence, intelligent manufacturing, industrial image recognition technology, structural health monitoring of aviation engineering, etc.

    View full text