Elsevier

Information Sciences

Volume 480, April 2019, Pages 194-210
Information Sciences

Object tracking under large motion: Combining coarse-to-fine search with superpixels

https://doi.org/10.1016/j.ins.2018.12.042Get rights and content

Abstract

We propose an object tracking method under large motion in image sequences. Dense sampling and particle filtering have been widely applied to cope with this problem; however, the former is computationally expensive, and the latter is sensitive to local minima. By introducing a novel search method based on coarse-to-fine strategy and image superpixels, we try to solve both drawbacks. In the coarse step, we first extract superpixels associated with a target object on the entire search region by using a simple generative appearance model. In the fine step, we perform a sampling and similarity measurement process within the selected superpixels to find the most accurate location of the target object, also suggest a way to use both a discriminative appearance model and a sophisticated generative appearance model simultaneously. Extensive experiments on popular benchmark dataset demonstrate that the proposed method outperforms other competitive approaches, and also show better results in challenging scenarios such as occlusion, deformation, out-of-view, and in-plane/out-of-plane rotation.

Introduction

Appearance-based object tracking is an important topic in computer vision with various applications such as video surveillance, human behavior analysis, human-robot interaction, and transportation monitoring [14], [38]. Among those studies, object tracking under large image motion, which is caused by large motions of object and/or camera, is still a challenging problem. In Fig. 1, we can see an exemplary case in which the estimation of target location failed in the large search region. For solving this issue, search strategy plays an important role, and several approaches have been suggested [5], [6], [23], [34], [42], [44], [49]; however, it still remains an unsolved problem.

Dense sampling or particle filter approaches are widely used as the search strategies to handle large motions. Dense sampling-based approaches [3], [8], [13], [21] are robust to local minima since they evaluate all possible locations of search regions by using a fixed-size sliding window method. However, their methods are computationally expensive due to the excessive samples where most of them are collected from relatively unnecessary regions. On the other hand, particle filter based approaches [20], [34], [36], [45] are computationally efficient because they explore the search region selectively by a prediction-updating process based on weighted samples generated from a posterior density distribution. Nonetheless, when many samples have accidentally converged to a false local minimum, particle filter cannot re-converged to a global minimum in the entire search region. Therefore, particle filter methods may be more sensitive to local minima than dense sampling methods.

In this paper, to achieve computational efficiency and robustness to local minima, we propose a novel search method that is based on a coarse-to-fine strategy and superpixels. In the coarse step, we explore whole search regions according to each location of superpixels and then extract the important superpixels that were associated with the tracked object, of which method is less computational cost than dense sampling method as shown in Section 4.3. As next the fine step, we perform a sampling and similarity measurement process within each selected superpixel, which gives a more accurate location of the object at the pixel-unit level. Since the proposed search strategy evaluates all local minima by using superpixels in the coarse step, it can be more robust to local minima, compared with the particle filter approach.

In coarse-to-fine search strategy, we also suggest a way to efficiently use two different appearance models, a generative appearance model (GAM) and a discriminative appearance model (DAM), for enhancing tracking. In the coarse step, we use a simple GAM with only hue-feature for computational efficiency, which tends to detect all object candidates (True Positive) and similar backgrounds (False Positive) simultaneously. In the fine step, we try to find the best object location by using a combined measure from both support vector machine (a DAM) and histogram-bin distribution (a GAM), where the hue-intensity-local binary pattern (LBP) features are used. Here, the DAM tries to focus on the best true positive, and the GAM enhances the discrimination between true positives and false positives; as a result, it tries to integrate the advantages of a GAM and a DAM.

The remainder of this paper is organized as follows: in Section 2, we review the related work; in Section 3, we describe the proposed tracking algorithm; in Section 4, we demonstrate the experimental results on a variety of challenging video clips. Finally, in Section 5, we discuss conclusions of this study.

Section snippets

Related work

Search strategy and appearance model are important components of appearance-based tracking approaches [37]; the former is how to find the best location and/or the best state of the target object in the search region and the latter plays a key role in evaluating the likelihood that the object is at some particular location. In this section, we review various methods of search strategy and appearance model.

Overview

The proposed method consists of three main components as shown in Fig. 2: coarse-to-fine search with superpixels, estimation of the object location, and selective model update. The coarse-to-fine search with superpixels aims to efficiently explore a large search region by using foreground and background separation [22]. In this method, all pixels in the search region are classified as foreground or background based on their probability, and the particle filter is used to estimate a location of

Data for tracker evaluation

We evaluated performances on several challenging video sequences of [41]. This dataset consists of 98 videos (100 objects), but we only used the 72 color videos (74 objects) which excluded 26 gray videos (26 objects) because we use color information. Further, tracking results and processing speed of 29 different methods have provided in [41]. We compared the performance of the proposed method with ten trackers that are considered the top-ten algorithms in one-pass evaluation of [41]: STRUCK [15]

Conclusion

In this study, we proposed a novel search method that is based on a coarse-to-fine strategy and superpixels to deal with the large motion caused by a target object and/or a camera. In coarse-to-fine search, we first extract important superpixels related to the object in the whole search region and then perform a sampling and similarity measurement process within the extracted superpixels. It can achieve computational efficiency and robustness to local minima. We also suggested a way to use two

Acknowledgements

This work was supported by the Technology Innovation Program (No. 10060086, A robot intelligence software framework as an open and self-growing integration foundation of intelligence and knowledge for personal service robots) funded by the Ministry of Trade, industry & Energy (MI, Korea), and also supported by the National Research Council of Science & Technology (NST) grant by the Korea government (MSIP) (No. CRC-15-04-KIST).

References (50)

  • B. Babenko et al.

    Robust object tracking with online multiple instance learning

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2011)
  • D. Chen et al.

    Constructing adaptive complex cells for robust visual tracking

    Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia

    (2013)
  • K. Chen et al.

    Visual object tracking via enhanced structural correlation filter

    Inf. Sci. (Ny)

    (2017)
  • D. Du et al.

    Online deformable object tracking based on structure-aware hyper-graph

    IEEE Trans. Image Process.

    (2016)
  • D. Du et al.

    Iterative graph seeking for object tracking

    IEEE Trans. Image Process.

    (2018)
  • M. Everingham et al.

    The pascal visual object classes (VOC) challenge

    Int. J. Comput. Vis.

    (2010)
  • H.K. Galoogahi et al.

    Correlation filters with limited boundaries

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, United States

    (2015)
  • J. Gao et al.

    P2t: Part-to-target tracking via deep regression learning

    IEEE Trans. Image Process.

    (2018)
  • J. Gou et al.

    Two-phase linear reconstruction measure-based classification for face recognition

    Inf. Sci. (Ny)

    (2018)
  • K. Granstrom, M. Baum, S. Reuter, Extended object tracking: introduction, overview and applications, arXiv:1604.00970v3...
  • S. Hare et al.

    Struck: structured output tracking with kernels

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2016)
  • D. Held et al.

    Learning to track at 100 fps with deep regression networks

    Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands

    (2016)
  • J.F. Henriques et al.

    Exploiting the circulant structure of tracking-by-detection with kernels

    Proceedings of the European Conference on Computer Vision, Florence, Italy

    (2012)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2015)
  • Z. Hong et al.

    Tracking using multilevel quantizations

    Proceedings of the European Conference on Computer Vision, Zurich, Switzerland

    (2014)
  • Cited by (0)

    View full text