Robust tracking using visual cue integration for mobile mixed images

https://doi.org/10.1016/j.jvcir.2015.04.006Get rights and content

Highlights

  • Maximizing observation likelihood to optimize particle weights under reflections.

  • Combining co-inference and maximum likelihood for visual cue integration.

  • Co-inference is combined with SIR to avoid the particle degeneracy problem.

  • Maximum likelihood selects the more reliable cue for co-inference fused state.

  • Motion compensation for both layer separation and prediction of the particle filter.

Abstract

The transmitted scene superposed with the reflected scene from a transparent surface leads to mixed images. Few methods have been devoted for tracking on mixed images while such images are ubiquitous in the real world. Thus, this paper proposes a robust single object tracking scheme for mixed images acquired by mobile cameras. Layer separation that decomposes mixed images extracts intrinsic dynamic layers before tracking. In order to make the tracker robust against camera motion, motion compensation is applied to both layer separation and prediction stage of the particle filter. To maximize the observation likelihood and thus optimize particle weights in the face of reflections, the proposed scheme combines sequential importance resampling (SIR) based co-inference and maximum likelihood for multi-cue integration. Experimental results show that the proposed scheme effectively improves tracking accuracy on mixed images with camera motion.

Introduction

Visual tracking is essential to many applications of computer vision. Previous trackers tackle problems such as occlusion, illumination variations, pose changes, cluttered background, complex trajectory, and multi-target. But few of them focus on the reflection interference problem. In fact, mixed images contain both reflections from the transparent surface and the transmitted scene behind the surface. Due to being superposed with the reflected scene, the appearance of the target and background in the transmitted scene change significantly in the mixed images and thus inaccurate tracking is easily raised. In complex and dynamic environments, multi-cue integration can improve the robustness of particle filter based trackers. Serby et al. combine multiple low-level features, including interesting points, edges, homogeneous and textured regions, into a particle filter framework for tracking [1]. And the multi-feature observation likelihood is the product of individual likelihoods of different features. To maximize the discriminability of multiple cues, Yang et al. use object detectors to adapt the target observation model [2]. Li et al. propose the weighted Dempter-Shafer fusion to combine evidences from different spatio-temporal SVMs (supper vector machine) into the observation likelihood of a particle filter [3]. The phenomenon of co-inference is proposed by Wu and Huang in 2001 and is refined in [4]. By using the structured variational inference to decouple the dynamics of multiple hidden states, Wu et al. propose co-inference tracking of multiple modalities. The variational parameters of one modality are inferred by the other modalities to maximize the observation likelihood [4]. Sparse representation can be also integrated into the particle filter for data fusion [5], [6], [7], [8]. Such trackers are robust against occlusion, illumination variations, and cluttered background. Wu et al. solves a l1-regularized least squares problem to estimate sparse coefficients of the target candidate [5].

Reflection separation is to estimate the transmitted scene and reflection image from the transparent surface, e.g., glass [9]. Blind source separation aims at estimating the unknown source signals and mixing matrix from a set of mixed signals [10]. For computer vision, reflection separation can take use of blind source separation to estimate source layers. Independent component analysis (ICA) based separation works under assumptions of independent source layers and static mixing, e.g. [11]. Given two mixed images, two source images can be separated by minimizing their structural correlations, e.g. [12]. Under the sparse prior over derivative filters on natural images [13], manually marked edges can help separation from a single image [14]. Automatic separation of weak reflection from a single image can be achieved by using the smoothness constraint and reducing the structural correlation between layers [15]. In [16], several structural priors in the transmitted and reflected layers are combined. Then the geometrical alignment of the reflection region in multiple mixed images is optimized using the augmented Lagrangian multiplier. Gai et al. consider the diversities of layer motions and model the transformation of reflected layers in a parametric way [17]. Instead, Li et al. assume the parametric transformation for transmitted layers so that variations of reflection layers can be handled [18]. The method in [19] tracks reflection regions in video frames. Another kind of layer separation is to derive intrinsic images including the illumination and reflectance images where the input image is the product of separated layers [13], [20]. The intrinsic image, the mid-level description of scenes, is defined by Barrow and Tenenbaum [20]. Sometimes, computer vision algorithms working on such descriptions achieve better performance.

Nowadays, few methods have been proposed for tracking on mixed images. Before using the Kanade–Lucas–Tomasi feature tracker for tracking an object in regions of reflections, the method in [21] applies layer separation [13] to temporally aligned frames to extract the background and foreground layers. Tracks of separated layers are longer than those of the mixed images. Since the focus of tracking is the target but neither the transmitted scene nor the reflected scene, layer separation [13] also extracts the dynamic layer before single object tracking in [22]. Based on the framework of particle filter with compensated motion model [23], the correction stage reweights particles using RGB and [I, R-G, Y-B] color histograms of the mixed images [20]. The [I, R-G, Y-B] color histogram is generated with the aid of a mask, indicating the dynamic regions on the mixed image. Then each particle weight is optimized using maximum likelihood. One problem with Chen et al. [22] is that tracking accuracy will decrease if videos have camera motion. This is mainly because the inaccurate static layer (i.e., reflectance image) makes edges of background and reflections contaminate the dynamic layer (i.e. illumination image). The measurement that refers to the inaccurate mask decreases estimation accuracy of the correction stage. Since few of previous trackers tackle the problem of reflection interference and multi-target tracking should discuss the areas of interactive multiple motion (IMM) model, data association, and state estimation in depth [24], this paper focuses on how to achieve robust single object tracking using multiple cue integration for mobile mixed images. This paper improves the work in [22] and its major contributions are stated as follows. (1) The proposed scheme improves tracking accuracy under the condition of reflections by combining co-inference [4] and maximum likelihood for visual cue integration. (2) The proposed particle filter based scheme realizes co-inference using sequential importance resampling (SIR) [25] instead of sequential important sampling (SIS) to avoid the degeneracy problem. (3) Layer separation with motion compensation for mobile images is proposed to extract objects with active motion. As a result, the proposed scheme significantly improves tracking accuracy on mobile mixed images. The remainder of this paper is organized as follows. Section 2 reviews the compensated motion model for tracking with mobile cameras [23]. Section 3 proposes motion compensated layer separation for images with camera motion. Section 4 proposes a robust single object tracking scheme that combines co-inference [4] and maximum likelihood for mixed images. Section 5 analyzes experimental results and Section 6 concludes this paper.

Section snippets

Overview of the compensated motion model for tracking on mobile images

The particle filter (PF) implements the Bayesian filter recursively using the sequential Monte Carlo method [26]. Bayesian tracking consists of the prediction and correction stages to estimate the target state over the posterior probability density function (pdf). Prediction obtains the prior pdf of the target state, xt, at time t byp(xt|z1:t-1)=p(xt|xt-1)p(xt-1|z1:t-1)dxt-1,where z1:t-1={z1,z2,,zt-1} is the set of observations up to time t-1. The correction stage updates the posterior pdf p(x

Layer separation using motion compensation for mobile images

An image can be represented as the product of the reflectance and illumination images. By layer separation, an image can be decomposed into intrinsic images, including a reflectance image and an illumination image [13], [20]. This is true no matter whether an image has reflections from the transparent surface. Assume that the reflectance is constant and the illumination changes. Layer separation in [13] estimates the intrinsic images based on the sparse prior over derivative filters on the

The proposed tracking scheme using visual cue integration for mobile mixed images

Tracking accuracy based on a single cue is limited since features (e.g., color) of the target appearance usually change as the target moves into the reflection region. As to color vision, physicists generally support the trichromatic theory (Young-Helmholtz theory) [28] while psychologists often agree with opponent-process theory [29]. Thus, based on the particle filter with compensated motion model [23], particle weights are optimized using both RGB and [I, R-G, Y-B] color histograms on the

Experimental results

In this section, analyses are divided into two parts. The first part aims at comparing the proposed scheme with the-state-of-art tracking methods (e.g., Chen et al. [22]), designed for robust tracking on mixed images. This part also provides the comparison with the fast L1 tracker [6], a method for tracking on images without reflections, to show the difference between the performance of trackers designed for mixed images and trackers for images without reflections. For this purpose, most of the

Conclusions

Reflections cannot be avoided in the real world. This paper applies motion compensation not only to the prediction stage of particle filter but layer separation to make the tracker robust against camera motion. To maximize the observation likelihood in the face of reflections, the proposed single object tracking scheme combines co-inference [1] and maximum likelihood to fuse RGB color and motion cues. The proposed scheme significantly outperforms the fast L1 tracker [6] and Chen et al. [20].

Acknowledgment

This work was supported in part by the Ministry of Science and Technology of Taiwan under the Grants NSC-101-2221-E-008-060 and MOST-103-2221-E-008-061.

References (39)

  • C.J. Xie et al.

    Collaborative object tracking model with local sparse representation

    J. Vis. Commun. Image Represent.

    (2014)
  • Herbert Bay et al.

    SURF: Speeded up robust features

    Comput. Vis. Image Underst.

    (2008)
  • K. Nummiaro et al.

    An adaptive color based particle filter

    Image Vis. Comput.

    (2003)
  • D. Serby, E.K. Meier, L. Van Gool, Probabilistic object tracking using multiple features, in: Proc. IEEE International...
  • M. Yang, F.J. Lv, W. Xu, Y.H. Gong, Detection driven adaptive multi-cue integration for multiple human tracking, in:...
  • X. Li et al.

    Visual tracking with spatio-temporal Dempster-Shafer information fusion

    IEEE Trans. Image Process.

    (2013)
  • Y. Wu et al.

    Robust visual tracking by integrating multiple cues based on co-inference learning

    Int. J. Comput. Vision

    (2004)
  • X. Mei, H. Ling, Robust visual tracking using l1 minimization, in: Proc. IEEE International Conference on Computer...
  • C. Bao, Y. Wu, H. Ling, H. Ji, Real time robust l1 tracker using accelerated proximal gradient approach, in: Proc. IEEE...
  • L.F. Wang et al.

    Visual tracking via kernel sparse representation with multikernel fusion

    IEEE Trans. Circuits Syst. Video Technol.

    (2014)
  • S. Shafer, Using Color to Separate Reflection Components, Technical report TR-136, Department of Computer Science,...
  • A. Cichocki et al.

    Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications

    (2002)
  • H. Farid, E. Adelson, Separating reflection and lighting using independent component analysis, in: Proc. IEEE...
  • B. Sarel, M. Irani, Separating transparent layers through layer information exchange, in: Proc. European Conference on...
  • Y. Weiss, Deriving intrinsic images from image sequences, in: Proc. IEEE International Conference on Computer Vision,...
  • A. Levin et al.

    User assisted separation of reflections from a single image using a sparsity prior

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Q. Yan, Y. Xu, X.K. Yang, Separation of weak reflection from a single superimposed image using gradient profile...
  • X.J. Guo, X.C. Cao, Y. Ma, Robust separation of reflection from multiple images, in: Proc. IEEE International...
  • K. Gai et al.

    Blind separation of superimposed moving images using image statistics

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • Cited by (2)

    • Multiple-target tracking on mixed images with reflections and occlusions

      2018, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Next, erosion along the temporal axis is applied to eliminate thin protrusions (i.e., discontinuous motion) of moving pixels since most of the object motion trajectory should be continuous. Table 1 provides comparisons of motion masks between the proposed scheme and Chen et al. [9]. With improved camera motion estimation and erosion along the temporal axis, our improved moving object detection on mixed images significantly improves accuracy of motion masks of video #4 that has camera motion.

    • Multi-target tracking using sample-based data association for mixed images

      2015, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    This paper has been recommended for acceptance by Yehoshua Zeevi.

    View full text