Elsevier

Neural Networks

Volume 88, April 2017, Pages 90-104
Neural Networks

Adaptive low-rank subspace learning with online optimization for robust visual tracking

https://doi.org/10.1016/j.neunet.2017.02.002Get rights and content

Abstract

In recent years, sparse and low-rank models have been widely used to formulate appearance subspace for visual tracking. However, most existing methods only consider the sparsity or low-rankness of the coefficients, which is not sufficient enough for appearance subspace learning on complex video sequences. Moreover, as both the low-rank and the column sparse measures are tightly related to all the samples in the sequences, it is challenging to incrementally solve optimization problems with both nuclear norm and column sparse norm on sequentially obtained video data. To address above limitations, this paper develops a novel low-rank subspace learning with adaptive penalization (LSAP) framework for subspace based robust visual tracking. Different from previous work, which often simply decomposes observations as low-rank features and sparse errors, LSAP simultaneously learns the subspace basis, low-rank coefficients and column sparse errors to formulate appearance subspace. Within LSAP framework, we introduce a Hadamard production based regularization to incorporate rich generative/discriminative structure constraints to adaptively penalize the coefficients for subspace learning. It is shown that such adaptive penalization can significantly improve the robustness of LSAP on severely corrupted dataset. To utilize LSAP for online visual tracking, we also develop an efficient incremental optimization scheme for nuclear norm and column sparse norm minimizations. Experiments on 50 challenging video sequences demonstrate that our tracker outperforms other state-of-the-art methods.

Introduction

Visual tracking refers to establish the correspondences of the interesting object in dynamic scene between the successive frames (Chen et al., 2016, Song, 2014, Song et al., 2017, Yang et al., 2016, Zhang, Zhang et al., 2014). As one of the core problems of computer vision, visual tracking has numerous advanced applications such as gesture recognition, video surveillance, medical image analysis and human computer interaction. As heavy occlusion, illumination change, and complex object motion may happen in complex and dynamic scenes, visual tracking is a challenging task. In the past decades, various algorithms have been proposed for visual tracking to solve these difficulties. These existing methods can be roughly divided into two categories: discriminative algorithms (e.g., Avidan, 2004; Babenko, Yang, & Belongie, 2009; Hare, Saffari, & Torr, 2011; Kalal, Mikolajczyk, & Matas, 2012; Zhang, Zhang, & Yang, 0000) and generative algorithms (e.g., Kwon & Lee, 2011; Liu, Jin, Su, & Zhang, 2014; Liu, Bai, Su, Zhang, & Sun, 2015; Mei & Ling, 2009; Zhang, Ghanem, Liu, & Ahuja, 2012a).

The discriminative algorithms formulate visual tracking as a binary classification problem which is to extract discriminative features to distinguish the target from the local background. For example, online boosting methods (Grabner and Bischof, 2006, Grabner et al., 2006) design an online boosting classifier to extract and maintain the best discriminative features for visual tracking. Yang, Lu, and Yang (2014) present a discriminative appearance model based on superpixels and facilitate a tracker to distinguish the target and the background with midlevel cues. However, when the limited data is available, discriminative models cannot perform better than generative models (Ng & Jordan, 2002).

In contrast, the generative algorithms model the appearance of the target and search for the candidate which is most similar to the object. In recent years, subspace, as an efficient geometric, has become the most popular choice in generative visual tracking. For example, incremental visual tracking (IVT) (Ross, Lim, Lin, & Yang, 2008) formulates the target applying principal component analysis (PCA) (Jolliffe, 2002, Liu et al., 2012) subspace. Moreover, its incrementally updated strategy can adapt to the appearance changes efficiently. However, IVT learns and updates the object subspace by using corrupted observations; thus, the brittleness of PCA for partial occlusions may decrease its performance.

Sparse representation has proven to be an efficient method to model subspace structure in vision data (Wright, Yang, Ganesh, Sastry, & Ma, 2009). Motivated by this idea, Mei and Ling (2009) proposed a L1 minimization based formula to model appearance subspace for tracking. Each target candidate is represented as a sparse linear combination of the target and trivial templates. However, the tracker cannot exploit redundant and rich image properties. So sparse prototypes (SP) (Wang, Lu, & Yang, 2013) exploit the advantages of both PCA and sparse representation to model the appearance of the target. Unfortunately, as these methods find the sparsest representation coefficients for each frame individually, they cannot capture the global subspace structure of the target.

It is also well known that under some suitable conditions, the low-rank and sparse decomposition based robust principal component analysis (RPCA) (Candès, Li, Ma, & Wright, 2011) model can also exactly recover a low-rank matrix from corrupted observations to represent the subspace. But unfortunately, the optimization of RPCA requires to solve nuclear norm minimization on the whole data matrix, which leads to high computational complexity. Moreover, in RPCA, the nuclear norm minimization problem has to be recomputed on the whole dataset if new samples arrive. Thus, standard RPCA is not suitable for sequential data analysis problems, such as visual tracking. To address above limitations, Liu, Lin, Su, and Gao (2014) develop a linear time optimization scheme for RPCA. Following this idea, the work in Wang, Liu, and Su (2015) and Zhang, Liu, Qiu, and Su (2014) also proposes an incremental extension of RPCA to learn low-rank features for object tracking. However, as low-rank features cannot be used to explicitly represent the subspace, additional postprocess has to be used on the learned low-rank features to obtain final appearance subspace representation. More importantly, all the above-mentioned low-rank methods are designed for general data. Thus no specific structure constraints for appearance subspace in object tracking problems are considered in these formulations. It can be seen in Fig. 1 that the general designed subspace learning based trackers cannot track the target well in challenging sequences. In addition, the filtering scheme proposed in Liu, Lin et al. (2014) cannot be used for column sparse optimization (e.g., 2,1 norm minimization), which are more suitable for frame specified corruptions in video sequences.

In this paper, we design a novel subspace learning model, called low-rank subspace learning with adaptive penalization (LSAP), to specifically address appearance subspace learning problems for object tracking. Different from general subspace learning methods, which do not consider specific constraints for tracking problems, we introduce an adaptive penalization to incorporate structure constraints of appearance subspace into our model. First, by using sample-subspace distance as a generative penalization, we can successfully remove samples which are far from the previously learned appearance subspace. Second, we train a confidence measure of foreground/background as a discriminative penalization. In this way, LSAP can remove candidate samples in the background region thus build more accurate appearance subspace for the object in the foreground region. To formulate frame specified corruptions in the video sequences, we utilize 2,1 norm rather than 1 norm as sparse regularization in our model. However, due to the attribute of 2,1 norm, we cannot use the online updating scheme in Liu, Lin et al. (2014) to solve LSAP. So as a nontrivial byproduct, we develop a new online optimization algorithm to incrementally update LSAP on sequential data. Comparing with the existing methods, the contributions of our work can be summarized as follows:

  • Compared with RPCA, which needs additional postprocess to build explicit subspace representation (e.g., subspace basis) on the learned low-rank features, LSAP can simultaneously learn the subspace basis, low-rank coefficients and column sparse errors by a single optimization model.

  • Most general subspace learning models do not consider any specific structure constraints for the object tracking problem. In contrast, we introduce an adaptive penalization to incorporate rich generative and discriminative information into LSAP model for specific appearance subspace learning. Thus, LSAP can extract correct appearance even when the observations are corrupted by severe corruptions (see Fig. 5, Fig. 6).

  • To address online optimization issues in LSAP, we extend the filtering idea in Liu, Lin et al. (2014) and develop an efficient incremental optimization scheme for 2,1 norm minimization. It should be noticed that our proposed numerical scheme can also be used to solve other 2,1 regularized low-rank learning problems, such as Liu et al. (2013). Table 1 shows that the speed of our method is much faster than standard RPCA.

  • By incorporating a novel convolutional location estimation process into our framework, we successfully provide address.

The remaining of the paper is organized as follows. In Section  2, the most relevant work are reviewed. Section  3 gives a detailed description of the proposed approach and the construction of penalization W. The online optimization scheme of our model is proposed in Section  4 in detail. Section  5 introduces visual applications of LSAP model. Experimental results of visual tracking are reported and analyzed in Section  6. In Section  7, we conclude our work and future work.

Section snippets

Previous work

In the past decades, various subspace learning formulations have been proposed in the subject of visual tracking. In this section, the most relevant algorithms are discussed.

Motivation and formulation

To explicitly represent the subspace, we hope that the subspace basis and low-rank coefficients can be learned simultaneously. By exploiting the incremental extension of RPCA, we develop a model which can learn the subspace basis, low-rank coefficients and sparse errors simultaneously. Furthermore, as our best acknowledgment, RPCA can exactly recover low-rank features in most cases. However, when the samples are corrupted severely (e.g., Fig. 5(b)), RPCA regards the errors as the part of

Online optimization

As LSAP model has the following three limitations, the previous optimization algorithm cannot be directly utilized. First, the solving of nuclear norm minimization requires SVD, which leads to extremely high computational complexity. Second, the constraint with orthogonality is a difficult problem because the constraint is not only non-convex but numerically expensive to preserve during iterations. Third, the column sparse norm is closely related to all samples in the sequence, the online

Facial image analysis

Extracting intrinsic features from facial data is the most important part in the subject of facial analysis. We apply our model on the CMU PIE database (Sim, Baker, & Bsat, 2002) to verify the performance of facial feature extraction. In the experiment, we randomly choose 70 images from each person under different illuminations and expressions and we remove a small part information of each chosen image as shown in Fig. 5(a) and (b). As our best acknowledgment, RPCA (Candès et al., 2011) works

Experiments

To validate the efficiency and effectiveness of LSAP tracker, we provide extensive experimental results. The experimental results are organized as two parts. First, we give an overview of the video datasets that we test our model with 1 and 2,1 norms. Then, the overall evaluation is reported. We compare our trackers which use generative/discriminative penalization with/without location estimation with other sixteen state-of-the-art trackers including CNT (Zhang, Liu, Wu, & Yang, 2016), KCF (

Conclusion

This paper presents a low-rank subspace learning model with adaptive penalization, which can extract the low-rank coefficients, sparse errors and subspace basis simultaneously. By penalizing the corresponding coding adaptively, the process of subspace learning can be supervised and a better subspace can be obtained. In addition, we develop a new online optimization scheme for LSAP model with column sparse norm. The experimental results on the facial data indicate that LSAP can exactly recover

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Nos. 61672125, 61300086, 61572096, and 61432003), the Fundamental Research Funds for the Central Universities (DUT15QY15) and the Hong Kong Scholar Program (No. XJ2015008).

References (52)

  • Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In...
  • Grabner, H., Grabner, M., & Bischof, H. (2006). Real-time tracking via on-line boosting. In BMVC, Vol. 1 (p....
  • Hare, S., Saffari, A., & Torr, P. (2011). Struck: Structured output tracking with kernels. In ICCV (pp....
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2015)
  • Isard, M., & Blake, A. (1998). Condensation-conditional density propagation for visual tracking. In IJCV, Vol. 29 (pp....
  • Jia, X., Lu, H., & Yang, M.-H. (2012). Visual tracking via adaptive structural local sparse appearance model. In CVPR...
  • I. Jolliffe

    Principal component analysis

    (2002)
  • Z. Kalal et al.

    Tracking-learning-detection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2012)
  • Kwon, J., & Lee, K. (2011). Tracking by sampling trackers. In ICCV (pp....
  • Y. Li et al.

    Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2008)
  • Z. Lin et al.

    Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning

    Machine Learning

    (2013)
  • Lin, Z., Liu, R., & Su, Z. (2011). Linearized alternating direction method with adaptive penalty for low-rank...
  • R. Liu et al.

    Robust visual tracking via l0 regularized local low-rank feature learning

    Journal of Electronic Imaging

    (2015)
  • R. Liu et al.

    Latent subspace projection pursuit with online optimization for robust visual tracking

    IEEE MultiMedia

    (2014)
  • Liu, R., Lin, Z., la Torre, F.D., & Su, Z. (2012). Fixed-rank representation for unsupervised visual learning. In...
  • G. Liu et al.

    Robust recovery of subspace structures by low-rank representation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2013)
  • Cited by (19)

    • Efficient joint model learning, segmentation and model updating for visual tracking

      2022, Neural Networks
      Citation Excerpt :

      It also integrates a correlation filter-based global appearance model into the framework to better use the global target structure. Liu, Wang, Han, Fan, and Luo (2017) build both a global appearance model based on adaptive low-rank subspace learning and a discriminative superpixel model to obtain the tracking result. In this section, we elaborate the proposed unified optimization framework described by Eq. (2).

    • Adaptive ensemble perception tracking

      2021, Neural Networks
      Citation Excerpt :

      Given the state of a target of interest in the first frame, which is usually represented by a bounding box, visual tracking aims to estimate the target states in the subsequent frames. Despite the progress in recent years (Dai, Wang, Lu, Sun, & Li, 2019; Li, Yan, et al., 2018; Liu, Wang, Han, Fan, & Luo, 2017; Yasukawa, Okuno, Ishii, & Yagi, 2016), visual tracking suffers from numerous challenges, including scale variation, aspect ratio variation, deformation, and rotation. Recently, the Siamese-based trackers (Bertinetto, Valmadre, Henriques, Vedaldi, & Torr, 2016; Tao, Gavves, & Smeulders, 2016), which cast tracking as a matching problem and learn a general similarity metric by off-line training, have drawn considerable attention in the tracking community due to their promising performance.

    View all citing articles on Scopus
    View full text