Thermal infrared pedestrian tracking using joint siamese network and exemplar prediction model
Introduction
Pedestrian tracking has been a hot topic in visual tracking in the past few years. Recently, the lower price and the higher image quality of thermal infrared (TIR) cameras lead pedestrian tracking over a TIR image sequence to be quite popular. Because TIR images are constructed based on the radiation emitted from observed objects, TIR-based pedestrian trackers can work under low light conditions and even total darkness. Moreover, TIR pedestrian trackers are robust to illumination changes and shadows, and have strong adaptability to the environment. Nowadays, TIR pedestrian tracking is applied to a broad spectrum of computer vision from intelligent vehicles to battlefield environment scout. However, due to the inherent characteristics of TIR images (e.g., the absence of color and texture information, energy attenuation, and sensor noise), TIR pedestrian tracking is still challenging.
Traditional filtering-based techniques are very popular for TIR pedestrian tracking. By integrating multi-cue of a target with a particle filter (PF), two improved TIR pedestrian trackers are presented in [34,36]. Based on partial least square regression and heuristic computation, a PF-based algorithm for tracking multiple TIR pedestrian targets is proposed in [39]. Kalman filtering is adopted in TIR automotive night vision to aid temporal association between frames [23]. Based on continuous correlation filters and adaptive feature fusion, a TIR object tracker with high performance is proposed in [42], while median filtering is adopted in TIR intelligent surveillance in [37]. Besides, support vector machines [3,15], histogram of oriented gradient [30], and sparse representation [19,25] are also broadly used in TIR pedestrian tracking.
Traditional methods can generally track a TIR pedestrian if he/she has clear shape and other detail features. In most cases, however, TIR targets lack shape information, and absent texture and color information, and have low contrast to the background, which will greatly degrade the performance of a traditional tracker. Recently, deep architectures have brought impressive advances in computer vision tasks, and become very popular in object tracking due to their robustness to the disturbance of external factors such as illumination, occlusion, and motion blur. Being encouraged by these achievements, some studies try to introduce deep models to TIR object tracking. Xu et al. performed several experiments on SCUT dataset [40], showing that convolutional neural network (CNN)-based methods, such as Faster-RCNN [28] and MSCNN [2], perform well on TIR pedestrian detection. Liang et al. introduced automatic matting to a CNN model to identify pedestrians from cluttered background [20]. Since 2017, the siamese network (SiamNet)-based trackers have attracted much attention for their ultra-fast speed and high accuracy. Nowadays, the variants of SiamNets, including fully convolutional siamese network (SiamFC) [1], CFNet [33], and SiamRPN [18], are very common in visible object tracking. For TIR object tracking, Shen et al. designed a TIR multi-pedestrian tracker in [31] based on faster regions with CNN features [28] and the improved SiamFC and presented promising tracking results.
However, either the basic SiamFC or the popular SiamRPN doesn't update the exemplar (template) of a tracked target over tracking. In other words, they don't fully take temporal information in a video, which greatly limits the tracking performance of SiamNet-based trackers, especially for tracking non-rigid objects such as pedestrians. To further improve TIR pedestrian tracking performance, this paper proposes a CNN-based exemplar prediction model that fully utilizes spatial information and temporal information around a TIR pedestrian target. Then, considering the high tracking accuracy and fast speed of SiamRPN, we integrate our prediction model with the SiamRPN to form an improved real-time TIR pedestrian tracker. The experimental results on state-of-the-art benchmark dataset PTB-TIR [21] show the strong advantages of our tracker over popular trackers such as scale correlation filter DSST [5], kernelized correlation filters (KCF) [9], and SiamRPN [18] in terms of tracking success rate and tracking precision.
The remainder of this paper is organized as follows. Section 2 introduces the basic idea and characteristics of SiamNet-based tracking. Section 3 describes the proposed CNN-based exemplar prediction model, and Section 4 gives the details of the improved SiamRPN for tracking TIR pedestrian. Section 5 presents experimental results and some discussions, and Section 6 gives some conclusions.
Section snippets
SiamNet-based tracking and its limitations
The pioneering work of using a SiamNet for object tracking has been done by Tao et al. [32]. But their tracker cannot run in real-time. Bertinetto et al. then proposed a fully convolutional siamese network (SiamFC) that was offline trained end-to-end on the ILSVRC15 dataset [1]. They showed that the tracking speed of the SiamFC based on the AlexNet achieved 65 frames per second (fps).
The basic idea of the SiamFC [1] is to adopt deep CNNs and cross-correlation to implement similarity learning,
Exemplar prediction model
To solve the problem that SiamNet-based trackers lack online updating on exemplar image, this paper designs a CNN-based exemplar prediction model whose structure is shown in Fig. 3. By simplifying the generator of SRGAN [16], we construct our prediction model based on a CNN with one residual block.
TIR pedestrian tracking with improved SiamRPN
This paper constructs a TIR pedestrian tracker based on the SiamRPN [18] for its attractive tracking speed and accuracy. By introducing a region proposal subnetwork to the SiamFC, the SiamRPN greatly improves tracking speed and tracking accuracy and solves the problem of scale variations of the SiamFC. However, it still doesn't update the exemplar image over tracking. To further enhance the tracking performance, we introduce our exemplar prediction model to the original SiamRPN. As shown in
Experimental results and analysis
The proposed TIR pedestrian tracker has been implemented in Python 3.7 with PyTorch 0.4.1 framework on a computer with an Intel i7–8700 K 3.70 GHz CPU, 64GB RAM, Ubuntu 18.04, and a single GPU Nvidia GTX 2080Ti. After being independently trained with Adam optimizer, the exemplar prediction model is connected with a trained SiamRPN [18] provided by PYSOT to get our TIR pedestrian tracker, as shown in Fig. 5.
Conclusions
After analyzing the characteristics of TIR pedestrian tracking as well as the weak points of SiamNet-based trackers, we design a CNN-based prediction model to get an online update exemplar in SiamRPN tracking, resulting in an improved SiamRPN tracker. Our exemplar prediction model uses a complex 3-channel image as the input which retains the appearance of an original pedestrian target, its current appearance, and its current surrounding context.
The experimental results on PTB-TIR show that our
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work has been supported by the National Natural Science Foundation of China (Grant No. 61771155).
References (43)
- et al.
Pedestrian detection and tracking in infrared imagery using shape and appearance
Comput. Vision Image Underst.
(2007) - et al.
Background-subtraction using contour-based fusion of thermal and visible imagery
Comput. Vision Image Underst.
(2007) - et al.
Pedestrian detection in thermal images: an automated scale based region extraction with curvelet space validation
Infrared Phys. Technol.
(2016) - et al.
Deep infrared pedestrian classification based on automatic image matting
Appl. Soft Comput.
(2019) - et al.
Deep convolutional neural networks for thermal infrared object tracking
Knowl. Based Syst
(2017) - et al.
Detection of pedestrians in far-infrared automotive night vision using region-growing and clothing distortion compensation
Infrared. Phys. Technol..
(2010) - et al.
On pedestrian detection and tracking in infrared videos
Pattern Recognit. Lett.
(2012) Modified particle filter-based infrared pedestrian tracking
Infrared Phys. Technol.
(2010)- et al.
Benchmarking a large-scale FIR dataset for on-road pedestrian detection
Infrared Phys. Technol.
(2019) - et al.
Robust thermal infrared object tracking with continuous correlation filters and adaptive feature fusion
Infrared Phys. Technol.
(2019)
Fully-convolutional siamese networks for object tracking
A unified multi-scale deep convolutional neural network for fast object detection
Eur. Conf. Comput. Vision.
Learning spatially regularized correlation filters for visual tracking
Accurate scale estimation for robust visual tracking
British Mach. Vision Conf.
A two-stage template approach to person detection in thermal imagery
ImageNet: a large-scale hierarchical image database
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
Transfer learning based visual tracking with Gaussian processes regression
Eur. Conf. Comput. Vision.
The sixth visual object tracking vot2018 challenge results
The visual object tracking VOT2014 challenge results
Cited by (10)
Target-Cognisant Siamese Network for Robust Visual Object Tracking
2022, Pattern Recognition LettersCitation Excerpt :The pioneering studies, SINT [7] and SiamFC [8], formulate the task as a matching problem, and use a Siamese network to extract the target and instance representations for measuring their similarity. Inspired by that, many Siamese trackers [9–14] have been proposed, producing increasingly promising results. These methods can be divided into two main categories: anchor-based and anchor-free methods.
Scale and appearance variation enhanced siamese network for thermal infrared target tracking
2021, Infrared Physics and TechnologyCitation Excerpt :Zheng et al. proposed an improved tracker based on convolutional neural network and siamese region proposal network. The exemplar of pedestrian was produced via constructed CNN-based prediction model, and then the SiamRPN was further utilized to achieve accurate pedestrian tracking results in real-time [19]. With the help of deep convolutional network, the tracking accuracy and robustness are improved.
Multiple pedestrian tracking under first-person perspective using deep neural network and social force optimization
2021, OptikCitation Excerpt :Wojke et al. [17] extended the work of simple online and real-time tracking (SORT) algorithm by extracting the apparent features of the target for nearest neighbor matching, which effectively improves the ID-switch problem caused by occlusion. Zheng et al. [18] constructed a CNN-based prediction model to produce the exemplar of a pedestrian target, then combine with SiamRPN to form an improved real-time pedestrian tracker. In summary, the accuracy of pedestrian tracking is directly determined by three factors: the performance of pedestrian detection, the efficiency of feature extraction, and robustness.
Thermal Infrared Target Tracking: A Comprehensive Review
2024, IEEE Transactions on Instrumentation and MeasurementTWO-LEVEL CASCADE MODEL FOR TRACKING PEDESTRIANS USING THERMAL INFRARED VIDEO INFORMATION
2023, Proceedings of the Romanian Academy Series A - Mathematics Physics Technical Sciences Information ScienceCloud Detection using SDGSAT-1 Thermal Infrared Data
2023, Proceedings of SPIE - The International Society for Optical Engineering
Handle by Associate Editor Antonio Fernández-Caballero.