PVDet: Towards pedestrian and vehicle detection on gigapixel-level images
Introduction
Object detection is one of the most central tasks in computer vision, aiming to distinguish classes of objects and locate positions in images, as well as being the technical support for many practical applications. Pedestrian and vehicle detection are popular research topics in target detection, with rich applications in assisted driving systems, intelligent monitoring, and other fields. The frequent mutual occlusion of vehicles in the city, the large-scale variation of vehicle pictures, and the non-rigid characteristics of pedestrians make the multi-pose and occlusion problems more severe and pose a significant challenge for pedestrian and vehicle detection.
As photographic technology advances quickly, gigapixel-level photography equipment is progressively integrated into various application scenarios. The COCO 2014 dataset (Lin et al., 2014) commonly used in previous studies only has 640 × 640 resolution. Larger resolution datasets are only available such as VisDrone 2018 (Zhu et al., 2020) (2k × 1.5k) and DOTA 2018 (Xia et al., 2018) (4k × 4k), and only a few or tens of targets are included in one image.
The PANDA (Wang et al., 2020b), a video image dataset of the gigapixel-level resolution, was suggested by academics at Tsinghua University to advance the development of high-resolution images and videos in computer vision. This dataset features wide FoV and high resolution( 26k × 15k), the number of targets up to 4k in a scene, and significant size changes between various targets ( 100 × scale variation). Gigapixel-level images bring more challenges to object detection, and several studies based on PANDA datasets have emerged recently. The literature (Li et al., 2022) uses a two-step cropping strategy to process original high-resolution images and then uses the Region NMS algorithm to reduce the impact caused by cropped targets. The setting of the threshold directly influences the accuracy of the test results. However, the optimal IOU threshold was not found, and the two-step cropping method was not fast. To boost the speed of gigapixel-level detection, a real-time detector, GigaDet, is proposed in the literature (Chen et al., 2022), which uses PGN (Patch Generation Network) modules to filter out regions unrelated to the target of interest to improve detection. However, the training process is not end-to-end; pedestrian posture and occlusion features are not considered. Literature (Wei et al., 2022) extends the detection to people and vehicles. The authors proposed SARNet, using transformer attention to optimize Faster rcnn, and obtained practical improvements. The use of a two-stage detection algorithm leads to its model having a large computational overhead and parameters. These methods have achieved good results. However, there is enormous room for enhancement in gigapixel-level images.
To address the problems mentioned above, this paper proposes a new end-to-end detector PVDet (Pedestrian and Vehicle Detection on Gigapixel-level Images), and the main contributions of this paper are as follows.
- (1)
Firstly, a novel backbone called DPRNet (Deformable deeP Residual Network) is proposed for improving the feature extraction capability for different shaped and occluded targets.
- (2)
For large inter-target scale differences and small targets in gigapixel-level images, PAFPN (Path Aggregation Feature Pyramid Network) is used to process the multi-layer features extracted by the backbone, delivering high resolution feature information through shorter paths. It iteratively fuses the multi-layer features to obtain a high resolution feature map with richer semantic information and to augment the detection accuracy for targets of different scales and small targets.
- (3)
To further utilize the information from PAFPN, multiple DyHead modules are introduced, which possess learning and sensing capabilities for scale, space, and task. And it can usefully enhance the detection head’s ability to classify and localize pedestrians and vehicles in high resolution images.
- (4)
After extensive experiments, it has been proved that the proposed method acquires the best performance on the PANDA dataset compared with other State-of-the-Art methods. We experiment adequately in verifying the generalizability and effectiveness of the proposed method on PASCAL VOC 2007.
The rest of the paper is organized as follows: the next section presents the related work of the paper, and the third section describes the method proposed in this paper. The fourth section compares and analyzes the method of this paper with advanced detection methods in a number of experiments. Finally, the fifth section provides a comprehensive summary of the paper.
Section snippets
Traditional methods
The literature (Ren and Li, 2015, Mao et al., 2015, Wang et al., 2019, Zhou and Yu, 2021, Hua et al., 2021, Kim et al., 2015, Ali and Bayoumi, 2016, Satzoda and Trivedi, 2015, Yuan et al., 2016) presented traditional pedestrian and vehicle detection methods. The literature (Ren and Li, 2015) adopted the LogitBoost algorithm combined with a mapped HOG (Histogram Orientation Gradient) descriptor to train the classifier, which improves the pedestrian detector training efficiency. Considering that
PVDet
In gigapixel-level resolution images, pedestrian and vehicle targets are characterized by large-scale variations, wide distribution, severe target occlusion, and deformation problems, and smaller targets are more challenging to detect. We construct a new pedestrian and vehicle detection model called PVDet, based on the basic idea of adaptive sample selection to cope with these problems. Section 3.1 presents the proposed overall framework for pedestrian and vehicle detection. In Section 3.2, a
Dataset
PANDA-Image (Wang et al., 2020b) is the first human-centric gigapixel-level dataset. The PANDA-Image dataset consists of 600 live images in a variety of scenes at a resolution of approximately 26k 15k, with a field of view coverage of up to , allowing thousands of targets to be observed simultaneously over a scale variation of nearly a hundred times. As can be observed in Fig. 5, there is a considerable variation in scale between targets and irregular population distribution. The dataset
Conclusion
This paper proposes a pedestrian-vehicle detector PVDet for gigapixel resolution images. First, we use the Deformable ConvNets v2 to improve the modeling capability of the backbone for deformed targets to extract pose variant pedestrian features better. Then higher resolution feature information is aggregated using PAFPN to improve the detection performance for multi-scale and small targets. Subsequent multiple DyHead modules with scale aware, spatially aware, and task-aware capabilities are
CRediT authorship contribution statement
Wanghao Mo: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Visualization. Wendong Zhang: Resources, Supervision, Funding acquisition. Hongyang Wei: Methodology, Formal analysis. Ruyi Cao: Validation, Data curation. Yan Ke: Validation. Yiwen Luo: Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, China (2020D01C033) and Doctoral Research Fund Project of Xinjiang University, China (202112120001).
Wanghao Mo received the B.E. degree from Wuhan university institute of splendid, Wuhan, China, in 2019. He is currently pursuing the M.S. degree with the Institute of Software in Xinjiang University. His research interests include computer vision, automated driving, and deep learning.
References (59)
- et al.
Towards real-time object detection in GigaPixel-level video
Neurocomputing
(2022) - et al.
Lla: Loss-aware label assignment for dense pedestrian detection
Neurocomputing
(2021) - et al.
Region NMS-based deep network for gigapixel level pedestrian detection with two-step cropping
Neurocomputing
(2022) - et al.
Research on pedestrian detection technology based on the SVM classifier trained by HOG and LTP features
Future Gener. Comput. Syst.
(2021) - et al.
Towards real-time dpm object detector for driver assistance
- et al.
Yolov4: Optimal speed and accuracy of object detection
(2020) - Cai, Zhaowei, Vasconcelos, Nuno, 2018. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of...
- et al.
Cascade R-CNN: high quality object detection and instance segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2019) - et al.
Fusion-based feature attention gate component for vehicle detection based on event camera
IEEE Sens. J.
(2021) - et al.
Mmdetection: Open mmlab detection toolbox and benchmark
(2019)
Tood: Task-aligned one-stage object detection
Ratio-and-scale-aware YOLO for pedestrian detection
IEEE Trans. Image Process.
SINet: A scale-insensitive convolutional neural network for fast vehicle detection
IEEE Trans. Intell. Transp. Syst.
Pedestrian-and vehicle-detection algorithm based on improved aggregated channel features
IEEE Access
A novel on-road vehicle detection method using HOG
IEEE Trans. Intell. Transp. Syst.
Deep learning approaches on pedestrian detection in Hazy weather
IEEE Trans. Ind. Electron.
Target-guided feature super-resolution for vehicle detection in remote sensing images
IEEE Geosci. Remote Sens. Lett.
GAN-based day-to-night image style transfer for nighttime vehicle detection
IEEE Trans. Intell. Transp. Syst.
Microsoft coco: Common objects in context
Cited by (7)
A monocular-based framework for accurate identification of spatial-temporal distribution of vehicle wheel loads under occlusion scenarios
2024, Engineering Applications of Artificial IntelligenceYOLO-FA: Type-1 fuzzy attention based YOLO detector for vehicle detection
2024, Expert Systems with ApplicationsExploiting the Potential of Overlapping Cropping for Real-World Pedestrian and Vehicle Detection with Gigapixel-Level Images
2023, Applied Sciences (Switzerland)
Wanghao Mo received the B.E. degree from Wuhan university institute of splendid, Wuhan, China, in 2019. He is currently pursuing the M.S. degree with the Institute of Software in Xinjiang University. His research interests include computer vision, automated driving, and deep learning.
Wendong Zhang received his B.S. and master’s degrees from Xinjiang University, Urumqi, China, in 1998 and 2005 respectively, and Ph.D. degree from Xi’an Jiaotong University, China, in 2019. He is currently working as an Associate Professor with Xinjiang University. His research interests include Edge Computing, IoT technology, Ad Hoc networks and Machine Learning.
Hongyang Wei graduated from Chongqing University of Technology in Chongqing, China, with a bachelor’s degree in engineering, in 2018. Now he is a postgraduate majoring in software engineering at Xinjiang University, China, and his main research fields are target detection and semantic segmentation.
Ruyi Cao received the B.S. degree in computer science and technology from Yanshan University, Qinhuangdao, China, in 2021. She is currently pursuing a master’s degree in engineering at the School of Software, Xinjiang University. Her research interests include computer vision and deep learning.
Yan Ke received the B.B.A. degree from Jiangxi Normal University, Nanchang, China, in 2020. She is currently pursuing the M.S. degree with the Institute of Software in Xinjiang University. Her research interests include computer vision and deep learning.
Yiwen Luo received the B.E. degree in computer science and technology from Hunan University of Finance and Economics, Changsha, China, in 2021. He is currently pursuing the M.S. degree with the Institute of Software in Xinjiang University. His research interests include machine learning and the vehicle routing problem with drones.