Object tracking under large motion: Combining coarse-to-fine search with superpixels
Introduction
Appearance-based object tracking is an important topic in computer vision with various applications such as video surveillance, human behavior analysis, human-robot interaction, and transportation monitoring [14], [38]. Among those studies, object tracking under large image motion, which is caused by large motions of object and/or camera, is still a challenging problem. In Fig. 1, we can see an exemplary case in which the estimation of target location failed in the large search region. For solving this issue, search strategy plays an important role, and several approaches have been suggested [5], [6], [23], [34], [42], [44], [49]; however, it still remains an unsolved problem.
Dense sampling or particle filter approaches are widely used as the search strategies to handle large motions. Dense sampling-based approaches [3], [8], [13], [21] are robust to local minima since they evaluate all possible locations of search regions by using a fixed-size sliding window method. However, their methods are computationally expensive due to the excessive samples where most of them are collected from relatively unnecessary regions. On the other hand, particle filter based approaches [20], [34], [36], [45] are computationally efficient because they explore the search region selectively by a prediction-updating process based on weighted samples generated from a posterior density distribution. Nonetheless, when many samples have accidentally converged to a false local minimum, particle filter cannot re-converged to a global minimum in the entire search region. Therefore, particle filter methods may be more sensitive to local minima than dense sampling methods.
In this paper, to achieve computational efficiency and robustness to local minima, we propose a novel search method that is based on a coarse-to-fine strategy and superpixels. In the coarse step, we explore whole search regions according to each location of superpixels and then extract the important superpixels that were associated with the tracked object, of which method is less computational cost than dense sampling method as shown in Section 4.3. As next the fine step, we perform a sampling and similarity measurement process within each selected superpixel, which gives a more accurate location of the object at the pixel-unit level. Since the proposed search strategy evaluates all local minima by using superpixels in the coarse step, it can be more robust to local minima, compared with the particle filter approach.
In coarse-to-fine search strategy, we also suggest a way to efficiently use two different appearance models, a generative appearance model (GAM) and a discriminative appearance model (DAM), for enhancing tracking. In the coarse step, we use a simple GAM with only hue-feature for computational efficiency, which tends to detect all object candidates (True Positive) and similar backgrounds (False Positive) simultaneously. In the fine step, we try to find the best object location by using a combined measure from both support vector machine (a DAM) and histogram-bin distribution (a GAM), where the hue-intensity-local binary pattern (LBP) features are used. Here, the DAM tries to focus on the best true positive, and the GAM enhances the discrimination between true positives and false positives; as a result, it tries to integrate the advantages of a GAM and a DAM.
The remainder of this paper is organized as follows: in Section 2, we review the related work; in Section 3, we describe the proposed tracking algorithm; in Section 4, we demonstrate the experimental results on a variety of challenging video clips. Finally, in Section 5, we discuss conclusions of this study.
Section snippets
Related work
Search strategy and appearance model are important components of appearance-based tracking approaches [37]; the former is how to find the best location and/or the best state of the target object in the search region and the latter plays a key role in evaluating the likelihood that the object is at some particular location. In this section, we review various methods of search strategy and appearance model.
Overview
The proposed method consists of three main components as shown in Fig. 2: coarse-to-fine search with superpixels, estimation of the object location, and selective model update. The coarse-to-fine search with superpixels aims to efficiently explore a large search region by using foreground and background separation [22]. In this method, all pixels in the search region are classified as foreground or background based on their probability, and the particle filter is used to estimate a location of
Data for tracker evaluation
We evaluated performances on several challenging video sequences of [41]. This dataset consists of 98 videos (100 objects), but we only used the 72 color videos (74 objects) which excluded 26 gray videos (26 objects) because we use color information. Further, tracking results and processing speed of 29 different methods have provided in [41]. We compared the performance of the proposed method with ten trackers that are considered the top-ten algorithms in one-pass evaluation of [41]: STRUCK [15]
Conclusion
In this study, we proposed a novel search method that is based on a coarse-to-fine strategy and superpixels to deal with the large motion caused by a target object and/or a camera. In coarse-to-fine search, we first extract important superpixels related to the object in the whole search region and then perform a sampling and similarity measurement process within the extracted superpixels. It can achieve computational efficiency and robustness to local minima. We also suggested a way to use two
Acknowledgements
This work was supported by the Technology Innovation Program (No. 10060086, A robot intelligence software framework as an open and self-growing integration foundation of intelligence and knowledge for personal service robots) funded by the Ministry of Trade, industry & Energy (MI, Korea), and also supported by the National Research Council of Science & Technology (NST) grant by the Korea government (MSIP) (No. CRC-15-04-KIST).
References (50)
- et al.
Context tracker: Exploring supporters and distracters in unconstrained environments
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, Colorado, United States
(2011) - et al.
Real-time tracking via on-line boosting
Proceedings of the British Machine Vision Conference, Edinburgh, Scotland
(2006) - et al.
Color-based probabilistic tracking
Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark
(2002) - et al.
Online robust non-negative dictionary learning for visual tracking
Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia
(2013) - et al.
Visual tracking via boolean map representations
Pattern Recognit.
(2018) - et al.
Fast compressive tracking
IEEE Trans. Pattern Anal. Mach. Intel.
(2014) - et al.
Augmenting cascaded correlation filters with spatial-temporal saliency for visual tracking
Inf. Sci. (Ny)
(2019) - et al.
Robust object tracking via sparse collaborative appearance model
IEEE Trans. Image Process.
(2014) - et al.
Slic superpixels
EPFL Technical report 149300
(2010) Ensemble tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, California, United States
(2005)