Elsevier

Pattern Recognition Letters

Volume 140, December 2020, Pages 66-72
Pattern Recognition Letters

Thermal infrared pedestrian tracking using joint siamese network and exemplar prediction model

https://doi.org/10.1016/j.patrec.2020.09.022Get rights and content

Highlights

  • A CNN-based exemplar prediction model is designed to enhance the original Siam-RPN.

  • Both the temporal and spatial information around an object are introduced to the prediction model.

  • The proposed tracker performs better than the existing trackers on the PTB-TIR benchmark.

  • The proposed tracker runs in real-time.

Abstract

Tracking pedestrian targets over a thermal infrared (TIR) image sequence is a hot topic in visual tracking. The imagery characteristics of TIR targets such as low target-background contrast and far imaging distance make TIR object tracking very difficult. In this paper, based on a convolutional neural network (CNN) and the siamese region proposal network (SiamRPN), we design an improved TIR pedestrian tracker. By fully considering the temporal and spatial information around an object, we firstly construct a CNN-based prediction model to produce the exemplar of a pedestrian target. Then the predicted exemplar is combined with SiamRPN to form an improved real-time TIR pedestrian tracker. The proposed tracker is evaluated on the TIR pedestrian tracking benchmark dataset PTB-TIR. Our experimental results demonstrate that the proposed tracker achieves promising tracking performance. In terms of tracking success rate and precision, our tracker outperforms traditional trackers such as KCF, and state-of-the-art trackers such as SiamRPN, SRDCF, and DSST. Moreover, similar to other siamese-network-based trackers, our tracker runs in real-time.

Introduction

Pedestrian tracking has been a hot topic in visual tracking in the past few years. Recently, the lower price and the higher image quality of thermal infrared (TIR) cameras lead pedestrian tracking over a TIR image sequence to be quite popular. Because TIR images are constructed based on the radiation emitted from observed objects, TIR-based pedestrian trackers can work under low light conditions and even total darkness. Moreover, TIR pedestrian trackers are robust to illumination changes and shadows, and have strong adaptability to the environment. Nowadays, TIR pedestrian tracking is applied to a broad spectrum of computer vision from intelligent vehicles to battlefield environment scout. However, due to the inherent characteristics of TIR images (e.g., the absence of color and texture information, energy attenuation, and sensor noise), TIR pedestrian tracking is still challenging.

Traditional filtering-based techniques are very popular for TIR pedestrian tracking. By integrating multi-cue of a target with a particle filter (PF), two improved TIR pedestrian trackers are presented in [34,36]. Based on partial least square regression and heuristic computation, a PF-based algorithm for tracking multiple TIR pedestrian targets is proposed in [39]. Kalman filtering is adopted in TIR automotive night vision to aid temporal association between frames [23]. Based on continuous correlation filters and adaptive feature fusion, a TIR object tracker with high performance is proposed in [42], while median filtering is adopted in TIR intelligent surveillance in [37]. Besides, support vector machines [3,15], histogram of oriented gradient [30], and sparse representation [19,25] are also broadly used in TIR pedestrian tracking.

Traditional methods can generally track a TIR pedestrian if he/she has clear shape and other detail features. In most cases, however, TIR targets lack shape information, and absent texture and color information, and have low contrast to the background, which will greatly degrade the performance of a traditional tracker. Recently, deep architectures have brought impressive advances in computer vision tasks, and become very popular in object tracking due to their robustness to the disturbance of external factors such as illumination, occlusion, and motion blur. Being encouraged by these achievements, some studies try to introduce deep models to TIR object tracking. Xu et al. performed several experiments on SCUT dataset [40], showing that convolutional neural network (CNN)-based methods, such as Faster-RCNN [28] and MSCNN [2], perform well on TIR pedestrian detection. Liang et al. introduced automatic matting to a CNN model to identify pedestrians from cluttered background [20]. Since 2017, the siamese network (SiamNet)-based trackers have attracted much attention for their ultra-fast speed and high accuracy. Nowadays, the variants of SiamNets, including fully convolutional siamese network (SiamFC) [1], CFNet [33], and SiamRPN [18], are very common in visible object tracking. For TIR object tracking, Shen et al. designed a TIR multi-pedestrian tracker in [31] based on faster regions with CNN features [28] and the improved SiamFC and presented promising tracking results.

However, either the basic SiamFC or the popular SiamRPN doesn't update the exemplar (template) of a tracked target over tracking. In other words, they don't fully take temporal information in a video, which greatly limits the tracking performance of SiamNet-based trackers, especially for tracking non-rigid objects such as pedestrians. To further improve TIR pedestrian tracking performance, this paper proposes a CNN-based exemplar prediction model that fully utilizes spatial information and temporal information around a TIR pedestrian target. Then, considering the high tracking accuracy and fast speed of SiamRPN, we integrate our prediction model with the SiamRPN to form an improved real-time TIR pedestrian tracker. The experimental results on state-of-the-art benchmark dataset PTB-TIR [21] show the strong advantages of our tracker over popular trackers such as scale correlation filter DSST [5], kernelized correlation filters (KCF) [9], and SiamRPN [18] in terms of tracking success rate and tracking precision.

The remainder of this paper is organized as follows. Section 2 introduces the basic idea and characteristics of SiamNet-based tracking. Section 3 describes the proposed CNN-based exemplar prediction model, and Section 4 gives the details of the improved SiamRPN for tracking TIR pedestrian. Section 5 presents experimental results and some discussions, and Section 6 gives some conclusions.

Section snippets

SiamNet-based tracking and its limitations

The pioneering work of using a SiamNet for object tracking has been done by Tao et al. [32]. But their tracker cannot run in real-time. Bertinetto et al. then proposed a fully convolutional siamese network (SiamFC) that was offline trained end-to-end on the ILSVRC15 dataset [1]. They showed that the tracking speed of the SiamFC based on the AlexNet achieved 65 frames per second (fps).

The basic idea of the SiamFC [1] is to adopt deep CNNs and cross-correlation to implement similarity learning,

Exemplar prediction model

To solve the problem that SiamNet-based trackers lack online updating on exemplar image, this paper designs a CNN-based exemplar prediction model whose structure is shown in Fig. 3. By simplifying the generator of SRGAN [16], we construct our prediction model based on a CNN with one residual block.

TIR pedestrian tracking with improved SiamRPN

This paper constructs a TIR pedestrian tracker based on the SiamRPN [18] for its attractive tracking speed and accuracy. By introducing a region proposal subnetwork to the SiamFC, the SiamRPN greatly improves tracking speed and tracking accuracy and solves the problem of scale variations of the SiamFC. However, it still doesn't update the exemplar image over tracking. To further enhance the tracking performance, we introduce our exemplar prediction model to the original SiamRPN. As shown in

Experimental results and analysis

The proposed TIR pedestrian tracker has been implemented in Python 3.7 with PyTorch 0.4.1 framework on a computer with an Intel i7–8700 K 3.70 GHz CPU, 64GB RAM, Ubuntu 18.04, and a single GPU Nvidia GTX 2080Ti. After being independently trained with Adam optimizer, the exemplar prediction model is connected with a trained SiamRPN [18] provided by PYSOT to get our TIR pedestrian tracker, as shown in Fig. 5.

Conclusions

After analyzing the characteristics of TIR pedestrian tracking as well as the weak points of SiamNet-based trackers, we design a CNN-based prediction model to get an online update exemplar in SiamRPN tracking, resulting in an improved SiamRPN tracker. Our exemplar prediction model uses a complex 3-channel image as the input which retains the appearance of an original pedestrian target, its current appearance, and its current surrounding context.

The experimental results on PTB-TIR show that our

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Grant No. 61771155).

References (43)

  • L. Bertinetto et al.

    Fully-convolutional siamese networks for object tracking

  • Z. Cai et al.

    A unified multi-scale deep convolutional neural network for fast object detection

    Eur. Conf. Comput. Vision.

    (2016)
  • M. Danelljan et al.

    Learning spatially regularized correlation filters for visual tracking

  • M. Danelljan et al.

    Accurate scale estimation for robust visual tracking

    British Mach. Vision Conf.

    (2014)
  • J.W. Davis et al.

    A two-stage template approach to person detection in thermal imagery

  • J. Deng et al.

    ImageNet: a large-scale hierarchical image database

  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • G. Jin et al.

    Transfer learning based visual tracking with Gaussian processes regression

    Eur. Conf. Comput. Vision.

    (2014)
  • Kingma, D. and J. Ba (2017). "Adam: a method for stochastic optimization."...
  • M. Kristan et al.

    The sixth visual object tracking vot2018 challenge results

  • M. Kristan et al.

    The visual object tracking VOT2014 challenge results

  • Cited by (10)

    • Target-Cognisant Siamese Network for Robust Visual Object Tracking

      2022, Pattern Recognition Letters
      Citation Excerpt :

      The pioneering studies, SINT [7] and SiamFC [8], formulate the task as a matching problem, and use a Siamese network to extract the target and instance representations for measuring their similarity. Inspired by that, many Siamese trackers [9–14] have been proposed, producing increasingly promising results. These methods can be divided into two main categories: anchor-based and anchor-free methods.

    • Scale and appearance variation enhanced siamese network for thermal infrared target tracking

      2021, Infrared Physics and Technology
      Citation Excerpt :

      Zheng et al. proposed an improved tracker based on convolutional neural network and siamese region proposal network. The exemplar of pedestrian was produced via constructed CNN-based prediction model, and then the SiamRPN was further utilized to achieve accurate pedestrian tracking results in real-time [19]. With the help of deep convolutional network, the tracking accuracy and robustness are improved.

    • Multiple pedestrian tracking under first-person perspective using deep neural network and social force optimization

      2021, Optik
      Citation Excerpt :

      Wojke et al. [17] extended the work of simple online and real-time tracking (SORT) algorithm by extracting the apparent features of the target for nearest neighbor matching, which effectively improves the ID-switch problem caused by occlusion. Zheng et al. [18] constructed a CNN-based prediction model to produce the exemplar of a pedestrian target, then combine with SiamRPN to form an improved real-time pedestrian tracker. In summary, the accuracy of pedestrian tracking is directly determined by three factors: the performance of pedestrian detection, the efficiency of feature extraction, and robustness.

    • Thermal Infrared Target Tracking: A Comprehensive Review

      2024, IEEE Transactions on Instrumentation and Measurement
    • TWO-LEVEL CASCADE MODEL FOR TRACKING PEDESTRIANS USING THERMAL INFRARED VIDEO INFORMATION

      2023, Proceedings of the Romanian Academy Series A - Mathematics Physics Technical Sciences Information Science
    • Cloud Detection using SDGSAT-1 Thermal Infrared Data

      2023, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus

    Handle by Associate Editor Antonio Fernández-Caballero.

    View full text