Elsevier

Expert Systems with Applications

Volume 62, 15 November 2016, Pages 17-31
Expert Systems with Applications

Video-based tracking of vehicles using multiple time-spatial images

https://doi.org/10.1016/j.eswa.2016.06.020Get rights and content

Highlights

  • Low complexity video-based ‘tracking-by-detection’ system for vehicles in traffic

  • Use of multiple time-spatial images for correction of data association in tracking

  • A new metric of tracking performance when ground truths of trajectories are absent

  • Experiments on two publicly available databases to compare tracking performance

  • Proposed method outperforms the stochastic trackers in varying weather conditions

Abstract

An innovative idea of vehicle tracking for video-based intelligent traffic management system is known to bring significant socioeconomic impact. A successful vehicle tracking method is always in demand to monitor different traffic parameters such as the average speed, strange movements, and congestion of vehicles or even to detect accidents automatically on highways or freeways. The challenges of traditional video-based vehicle tracking methods include the initialization of tracking to tackle an unknown number of targets and the reduction of the drift sensitivity of targets from true positions mainly caused by the variations in lighting condition, occlusions and camera position. To address these challenges, this paper presents a novel vehicle tracking method for a traffic management system that introduces the multiple time-spatial images (MTSIs)-based detection in the stochastic filter-based tracking. The MTSI-based tracking employs the concept of multiple numbers of key vehicular frames (KVFs) for each of the vehicular-objects in the traffic. These KVFs provide highly accurate positional information of the vehicles due to the fact that the shape and texture of the vehicles are comparable on the same scale and do not depend on the speed of the traffic. The spatial correspondence of a vehicle in successive KVFs is then incorporated as a low-complexity data association technique to alleviate the common problem of drifting in the stochastic filter-based method and thereby increasing the accuracy in tracking trajectory. Comprehensive experimentations are carried out using two publicly available video databases (EBVT and GRAM-RTM) that have traffics of varying environments to evaluate the vehicle tracking performance of the proposed method as compared to the existing methods. Experimental results demonstrate that the introduction of MTSIs not only automates the initialization of tracking, but also significantly increases the accuracy of the tracking trajectories of the vehicles on roads evaluated both in the presence and absence of ground truths.

Introduction

Fatalities and serious injuries caused by the traffic-related accidents are globally recognized as a serious and growing problem due to increasing usage of automobiles (National Highway Traffic Safety Administration, 2011). In order to reduce traffic-related accidents and increase the chances of safe and smooth driving of automobiles on highways or freeways, the development of intelligent traffic management system that automatically tracks the vehicles on roads has been recognized as an active area of research in the past two decades (Sun, Bebis, & Miller, 2006). Further, the automatic tracking of vehicles is required in numerous transportation related surveillance applications including the monitoring of roads to acquire information on well-known traffic parameters such as the average speed as per the categories of vehicles, overstepping on road marks, congestion of vehicles, detection of foreign objects, and inference of suspicious activities on road. Such an expert tracking system embedded in vehicles can be also useful for safety-aware instructions for driver assistance or even for the development of driverless intelligent transportation systems.

There exists three major approaches of vehicle tracking, viz., wireless distributed sensor network-based tracking (Brooks, Ramanathan, Sayeed, 2003, Duarte, Hu, 2004), remote-sensing-based tracking, e.g., radar sensing (Gunnarsson, Svensson, Danielsson, Bengtsson, 2007, Tokoro, Kuroda, Kawakubo, Fujita, Fujinami, 2003) and lidar sensing (Galceran, Olson, Eustice, 2015, Premebida, Monteiro, Nunes, Peixoto, 2007, Weigel, Lindner, Wanielik, 2009), and video-based tracking (Mandellos, Keramitsoglou, Kiranoudis, 2011, McCall, Trivedi, 2006, Sivaraman, Trivedi, 2013). Since the performance of sensor network-based tracking methods is highly dependent on the availability of controlling radio frequency signals or sensors, such methods fail to provide instantaneous assistance when the sensor-mounted vehicles are out of range of the network or any surrounding vehicle does not have any sensor. Remote sensing-based tracking methods transmit the radar or laser signals from moving vehicles and estimate the position of surrounding objects using the corresponding received signals (Hoogendoorn, Zuylen, Schreuder, Gorte, & Vosselman, 2003). In many cases, the costly radar or lidar sensors fail to provide sufficient field-of-view required for vehicle tracking and due to high sensitivity to environmental noise such methods very often fail to classify the moving objects (Sivaraman & Trivedi, 2013). In this context, video-based tracking techniques capture video frames using the noise-robust and ultra-fast CCD or CMOS cameras that usually have a wide field-of-view of the vehicle. Table 1 summarizes the strengths and weaknesses of the sensor network, remote sensing, and video-based vehicle tracking approaches. In general, the video-based tracking methods are preferred to the others, since such an approach provides very accurate estimates of the relative positions and classification of surrounding vehicles even in the remote areas (McCall & Trivedi, 2006).

Traditional video-based vehicle tracking techniques follow two approaches - one treats the identification of vehicular objects in a frame and the estimation of correspondences of the objects in successive frames independently (Amer, 2005, Kim, 2008), while the other treats both the issues jointly (Aksel, Acton, 2010, Dellaert, Thorpe, 1998, Isard, Blake, 1998a, Segall, Chen, Acton, 1999, Stauffer, Grimson, 1999). In the case of simultaneous tracking of multiple vehicles, the second approach is very often preferred to the first to obtain higher accuracy of the estimated trajectories. To identify the vehicles and find their positions in successive frames, the simplest approach can be template matching by assuming the rigid body movements of the objects (Brunelli, 2009). If the pose of the object changes, then the classical Lucas-Kanade affine tracker can be employed (Lucas & Kanade, 1981). In order to obtain a tracking system that is robust to the change of shapes and viewing positions of the vehicles, the corners or points of interest of the deformable vehicular objects are determined and features such as the histograms of oriented gradients (HOGs) (Niknejad, Takeuchi, Mita, McAllester, 2012, Olmedo, Sastre, Bascon, Caballero, 2013), the speeded up robust features (SURFs), the scale invariant feature transforms (SIFTs) (Lu, Izumi, Teng, Wang, 2014, Mantripragada, Trigo, Martins, Fleury, 2013, Shi, Tomasi, 1994) and the binary robust invariant scalable keypoint (BRISK) features (Hassannejad, Medici, Cardarelli, & Cerri, 2015) are obtained from these points. Due to the fact that the vehicles are identified and their positions are estimated using the HOG, SURF, SIFT or BRISK features generated from the points of interest only, the tracking trajectories estimated from these methods may not provide satisfactory performance for occlusions or noisy environments. Rather, the standard statistical techniques applied to the entire set of pixels of moving objects show a greater success in general for joint recognition of vehicles in the frames and estimation of their positions in the successive frames.

Among statistical methods, moving vehicles in a video traffic are modelled in the state-space frame work, in which the pixel intensities of the vehicular objects are assumed to follow certain random processes. Measurements for the random processes include the position, velocity, acceleration, and the color histogram of vehicular objects in the frames (Lee, Ryoo, Riley, & Aggarwal, 2009). The stereo depths of the moving objects (Ess, Leibe, Schindler, Gool, 2009, Zhu, Yuan, Zheng, Ewing, 2012) and the scene geometry or viewing condition (Olmedo et al., 2013) are also used for increasing the accuracy of tracking trajectories. In order to find the solution of the state-space model, the density function of the random processes can be chosen as parametric or non-parametric. The Kalman filter (KF)-based vehicle tracking is the most popular among the parametric approaches (Chiverton, 2012, Shantaiya, Verma, Mehta, 2015, Sivaraman, Trivedi, 2013), which obtains an analytical solution for tracking by assuming linear dynamics of vehicular movements and a Gaussian distributed intensity of vehicular objects (Shalom, Li, & Kirubarajan, 2001). Due to the variations of traffic, weather or viewing conditions, the intensities of the vehicular objects may follow non-Gaussian statistics and the movements of vehicular objects may follow a non-linear dynamics. In such a case, advanced versions of the KF including the extended Kalman filter (EKF) have been proposed (Li, Wang, Wang, Li, 2010, Mandellos, Keramitsoglou, Kiranoudis, 2011, Simon, Chia, 2002). In complex traffic environments, the non-parametric approach that uses the Gaussian mixture density function (Stauffer & Grimson, 1999), kernel-based mean-shift filter (Comaniciu, Ramesh, & Meer, 2003), or the particle filter (PF) (Aksel, Acton, 2010, Chan, Huang, Fu, Hsiao, Lo, 2012, Isard, Blake, 1998a, Isard, Blake, 1998b, Liu, Li, Wang, Ni, 2015), has also been adopted to describe the nonlinear and non-Gaussian random processes of the moving objects. To improve the tracking performance, in addition to the pixel intensities of the moving objects, the color cues (Barcellos, Bouvie, Escouto, Scharcanski, 2015, Lehuger, Lechat, Perez, 2006, Nummiaro, K-Meier, Gool, 2002, Yin, Zhang, Sun, Gu, 2011) or edge features (Kumar & Sivanandam, 2012) have also been used in the traditional PF. Other statistical tracking algorithms include the nearest neighbor (NN), the multiple hypotheses tracking (MHT) (Cox, Hingorani, 1996, Kim, Li, Ciptadi, Rehg, 2015, Zulkifley, Moran, 2012) and the joint probabilistic data association filter (JPDAF) (Shalom, Fortmann, & Cable, 1990). The NN-based methods are not at all reliable for tracking in the cluttered environment (Raol, 2010). The MHT-based data association relies on the enumeration of hypotheses, the number of which can grow exponentially considering all possibilities (Oh, Russell, & Sastry, 2004). Due to the nature of sequential tracking, in general, the JPDAF-based methods perform better than the MHT-based methods (Shalom, Daum, & Huang, 2009). However, the limitations of the JPDAF-based methods lie in their assumption of fixed number of targets as well as their inability to initiate or terminate any track. Since the whole set of feasible observation-to-track assignment is necessary in the JPDAF, the computational complexity grows exponentially with the number of targets, and it becomes even more severe when the JPDAF is combined with the PFs (Ekman, 2008). Due to their combinatorial nature, the MHT and JPDAF-based methods are usually preferred for tracking of few objects in a smaller number of consecutive frames or for tracking a single object over a longer period (Ess et al., 2009).

The inherent challenge of the conventional stochastic filter-based vehicle tracking methods is two fold. First, the methods are highly sensitive to drift from the true position, especially in long sequences wherein the frequent occurrences of occlusion of vehicles, changes in lighting conditions due to shadow, and sudden turns or bumps of vehicles are common. The second problem of such methods lies in the fact that they are semi-supervised, since the initial coordinate of a vehicular object in a video is necessary as an input to start the tracking. As a consequence of the requirement of an accurate vehicular-target initialization, the automatic tracking of an unknown number of vehicles in a complex scene becomes difficult for the existing stochastic filter-based techniques. In this context, the tracking-by-detection approach has become one of the effective solutions for initializing as well as keeping a moving object within the track (Chien, Chan, Tseng, Chen, 2013, Ess, Leibe, Schindler, Gool, 2009, Ess, Schindler, Leibe, Gool, 2010, Gavrila, Munder, 2007). For example, a correct motion model is estimated from a set of candidate trajectories generated by the EKF-trackers of different observations by using the semantic information of the object category (Ess et al., 2010). The vehicles are recognized by Haar-like wavelet features with Adaboost classifier and then tracked by the PF in a framework of active learning vehicle recognition and tracking (ALVeRT) system (Sivaraman & Trivedi, 2010). In Shen and Miao (2014), the tracklets are defined by using the Gaussian mixture model-based spatio-temporal features of objects in few consecutive frames and the association of reliable tracklets are used for object tracking. Battiato et al. (2015) reported a tracking-by-detection method wherein a supervised patch-based learning algorithm has been employed for classifying a vehicle into one of two-types followed by the multicorrelation-based template matching for tracking the vehicle. Kim et al. (2015) have integrated the MHT algorithm in a tracking-by-detection framework, which incorporates long-term appearance models of vehicles represented by features obtained from deep convolution neural network. Zhao, Xia, Xu, Shi, and Liu (2016) reported a vehicle tracking method referred to as the adaptive partial occlusion segmentation (APPOS) that estimates the optical flow of contours of occluded objects that are identified by comparing the foreground and background of the scene. In the application of event based video summarization (Song et al., 2016), the vehicles are identified first by deformable part-based model and then the trajectories are estimated using a greedy data association framework. Surveys on various object tracking methods which may or may not be applied to vehicular traffics can be found in Ali et al. (2016); Wu, Lim, and Yang (2013); Yilmaz, Javed, and Shah (2006).

Table 2 summarizes the notable strengths and weaknesses of existing video-based tracking approaches including the features-based, parametric and non-parametric intensity-based, and tracking-by-detection approaches. In general, the tracking-by-detection approach shows significant success for tracking of multiple numbers of vehicles in long sequence as compared to others. But, the success of tracking-by-detection approach is highly dependent on how accurately the objects are identified over the frames. Moreover, this approach requires the detection of vehicles on every frame of the video, which imposes significant additional computation load to the tracking system in general. Hence, there remains an open scope to integrate a highly accurate and computationally efficient vehicle detection technique to the well-known approach of parametric or nonparametric intensity-based tracking so that performance of the overall tracking system can be improved significantly.

This paper focuses only on the tracking of vehicles in the road traffic, although a relatively high number of tracking methods can be found in the literature that deal with moving objects of different categories such as the humans, machines, and fishes or moving appearances such as faces and gestures. The major contributions of this work are as follows:

  • We introduce the multiple time-spatial images (MTSIs) generated from a number of virtual detection lines (VDLs) that are reported to be very effective in identifying the vehicular objects independent of their speeds and sizes (see Rashid, Mithun, Joy, Rahman, 2010, Mithun, Rashid, Rahman, 2012) for the estimation of trajectories of multiple numbers of vehicles under the principle of tracking-by-detection at a relatively low computational complexity.

  • We present a novel vehicle tracking system that is capable of identifying a vehicular object automatically whenever it appears in a scene.

  • We propose a new metric for evaluation of accuracies of tracking trajectories when comparing the performance of the proposed MTSI-based vehicle tracking system with the existing ones in the absence of ground truth data.

In particular, the MTSI-based vehicle detection method employs a concept of key vehicular frame (KVF) from which the position of the centroid of a vehicle may be estimated accurately in the frame of a video containing reasonably straight traffic. Thus, the accurate positions obtained from a suitable set of KVFs can be used in leveraging the data association technique to address the drifting problems of the existing stochastic filter-based techniques for vehicle tracking system. By detecting the vehicles only in certain KVFs, the computational cost of frame-wise object recognition of existing tracking-by-detection techniques can also be reduced significantly. Nevertheless, the initialization of tracking coordinates of incoming vehicles in any stochastic tracker is aided by the KVFs generated from the MTSIs. The performance of the proposed video-based tracking system is investigated and compared with that of the existing methods using two publicly available representative databases with or without annotations of the vehicular objects.

The paper is organized as follows. In Section 2, a brief review of MTSI-based vehicle detection is given. Section 3 presents the proposed tracking system that uses the MTSIs for initialization as well as for drift corrections of the trajectories obtained from stochastic trackers such as the Kalman or particle filter. Experimental results demonstrating the significance of using MTSIs for tracking of vehicular objects are given in Section 4. Finally, Section 5 provides the concluding remarks.

Section snippets

Multiple time spatial images - a brief review

A time spatial image (TSI) is generated by placing the pixel strips of the frames on a VDL in a chronological order (Rashid et al., 2010). A VDL is in fact a set of indices on the frames whose position is usually perpendicular to the motion of the vehicles and is independent of the frames. An example of a VDL on a few consecutive frames of a video-sequence and the corresponding TSI for the sequence are shown in Fig. 1. It can be seen from this figure that each of the vehicles passing the VDL

MTSI-based tracking methods

Let Z (Z > 0) be the number of vehicles passing through L(L > 0) number of frames of size of X × Y (X, Y > 0) in a video traffic. Let N (0 < N < L) be the number of VDLs that are placed on the video sequence such that the lines are perpendicular to the direction of traffic flow (see Fig. 3). Since each of the vehicles in a TSI results in a KVF, there will be ZN number of KVBs obtained from MTSI-based vehicle identification method. Since the KVBs provide accurate information about the positions

Experimental results

We performed several experiments to evaluate the performance of the proposed approach of tracking-by-detection of vehicles in the traffic. This section highlights the characteristics of datasets that are used to evaluate the tracking performance. The experimental setup, the criteria for performance evaluation, and the results of tracking are also presented in this section.

Conclusion and discussion

Development of efficient tracking technique for vehicles is crucial in designing video-based intelligent traffic management system. Traditional tracking methods mainly suffer from initialization of tracking, sensitivity to drift from true object position in long sequences, and absence of any corrective mechanism in the data association of tracking. Thus, traffic parameters such as the average vehicular speed, congestion of vehicles, occurrences of events predicted by the existing tracking

Acknowledgments

The authors would like to acknowledge the Samsung R&D Institute Bangladesh for providing travel supports to the first author through which a part of the EBVT database was collected from Republic of Korea. The authors would also like to give thanks to the anonymous reviewers for their valuable comments that were useful to improve the quality of the paper.

References (75)

  • M.A. Zulkifley et al.

    Robust hierarchical multiple hypothesis tracker for multiple-object tracking

    Expert Systems with Applications

    (2012)
  • A. Aksel et al.

    Target tracking using the snake particle filter.

    Proceedings IEEE southwest symposium image analysis and interpretation. Austin, TX,

    (2010)
  • A. Ali et al.

    Visual object trackingùclassical and contemporary approaches

    Frontiers of Computer Science

    (2016)
  • A. Amer

    Voting-based simultaneous tracking of multiple video objects.

    IEEE Transactions Circuits and Systems for Video Technology

    (2005)
  • R.R. Brooks et al.

    Distributed target classification and tracking in sensor networks

    Proceedings IEEE

    (2003)
  • R. Brunelli

    Template matching techniques in computer vision: theory and practice

    (2009)
  • Y.M. Chan et al.

    Vehicle detection and tracking under various lighting conditions using a particle filter

    IET Intelligent Transport Systems

    (2012)
  • S.Y. Chien et al.

    Video object segmentation and tracking framework with improved threshold decision and diffusion distance

    IEEE Transactions Circuits and Systems for Video Technology

    (2013)
  • J. Chiverton

    Helmet presence classification with motorcycle detection and tracking

    IET Intelligent Transport Systems

    (2012)
  • D. Comaniciu et al.

    Kernel-based object tracking.

    IEEE Transition Pattern Analysis and Machine Intelligence

    (2003)
  • T. Cox et al.

    An efficient implementation of reid’s multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking

    IEEE Transactions Pattern Analysis and Machine Intelligence

    (1996)
  • F. Dellaert et al.

    Robust car tracking using kalman filtering and bayesian templates

    Proceedings SPIE conference intelligent transportation systems. Pittsburgh, PA

    (1998)
  • M. Ekman

    Particle filters and data association for multi-target tracking

    International conference information fusion. Cologne, Germany

    (2008)
  • A. Ess et al.

    Robust multi-person tracking from a mobile platform

    IEEE Transactions Pattern Analysis and Machine Intelligence

    (2009)
  • A. Ess et al.

    Object detection and tracking for autonomous navigation in dynamic environments

    International Jouranl of Robotics Research

    (2010)
  • M. Everingham et al.

    The PASCAL visual object classes (voc) challenge

    International Journal Computer Vision

    (2010)
  • E. Galceran et al.

    Augmented vehicle tracking under occlusions for decision-making in autonomous driving

    Proceedings international conference intelligent robots and systems. Hamburg, Germany

    (2015)
  • D.M. Gavrila et al.

    Multi-cue pedestrian detection and tracking from a moving vehicle

    International Journal Computer Vision

    (2007)
  • J. Gunnarsson et al.

    Tracking vehicles using radar detections

    Proceedings IEEE intelligent vehicles symposium. vol. 1. Istanbul, Turkey

    (2007)
  • S.P. Hoogendoorn et al.

    Microscopic traffic data collection by remote sensing

    Transportation Research Record: Journal Transportation Research Board

    (2003)
  • M. Isard et al.

    CONDENSATION-conditional density propagation for visual tracking

    Internatioanl Journal Computer Vision

    (1998)
  • M. Isard et al.

    A mixed-state condensation tracker with automatic model-switching

    Proceedings international conference computer vision. Bombay, India

    (1998)
  • Z. Kalal et al.

    Forward-backward error: Automatic detection of tracking failures

    Proceedings international conference pattern recognition. Istanbul, Turkey

    (2010)
  • C. Kim et al.

    Multiple hypothesis tracking revisited

    Proceedings IEEE international conference computer vision. Santiago, Chile

    (2015)
  • Z. Kim

    Real time object tracking based on dynamic feature grouping with background subtraction

    Proceedings IEEE conference computer vision and pattern recognition. Anchorage, AK

    (2008)
  • T.S. Kumar et al.

    Object detection and tracking in video using particle filter

    Proceedings international conference computing communication & networking technologies. Coimbatore, India

    (2012)
  • J.T. Lee et al.

    Real-time illegal parking detection in outdoor environments using 1-d transformation

    IEEE Transactions Circuits and Systems for Video Technology

    (2009)
  • Cited by (33)

    • Estimating vehicle speed through a driving experiment

      2022, Forensic Science International: Reports
      Citation Excerpt :

      There have been studies on an algorithm that measures the speed of the vehicle, after reconstructing movements by 3D modeling of people and cars from CCTV images that record traffic conditions [3]. There have been studies on algorithms for detecting objects in video images recorded by CCTV, estimating specific objects, and measuring vehicle speed by using specific points, road widths, and vehicle pixels [4–7]. A proposal was made on a vehicle driving speed estimation method by applying a virtual plane and a reference line to the forensic video.

    • Generative adversarial network-based atmospheric scattering model for image dehazing

      2021, Digital Communications and Networks
      Citation Excerpt :

      At this time, the contrast, color clarity and other features of the image collected by an imaging sensor deteriorate, and many other details are all lost in the image, especially the objects deeper in the scene [1]. In addition, video target tracking [2], monitoring systems [3] and autonomous vehicles [4] perform even worse based on edge computing applications. The consumption of pictures and videos will increase with the rapid development of 5G communication systems [5,6].

    • Multiple objects tracking by a highly decisive three-frame differencing-combined-background subtraction method with GMPFM-GMPHD filters and VGG16-LSTM classifier

      2020, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Indeed, acquiring accurate and up-to-date traffic information and statistics through traffic surveillance is increasingly high in demand for many reasons for law enforcement [6,7], for immediate controlling of traffic signals [8], and for collecting statistical data [9]. The tracking of vehicles can be categorized into three major approaches, viz. remote sensing-based [10,11], sensor network-based [12], and video-based [13,14] tracking methods. In the remote sensing-based tracking methods, the position of the target and the surrounding vehicles is estimated using the corresponding received radar or laser signals that are transmitted from the objects [15,16].

    View all citing articles on Scopus
    1

    Significant part of this work has been done when the author is in BUET.

    View full text