Adaptive low-rank subspace learning with online optimization for robust visual tracking

doi:10.1016/j.neunet.2017.02.002

Neural Networks

Volume 88, April 2017, Pages 90-104

https://doi.org/10.1016/j.neunet.2017.02.002 Get rights and content

Abstract

In recent years, sparse and low-rank models have been widely used to formulate appearance subspace for visual tracking. However, most existing methods only consider the sparsity or low-rankness of the coefficients, which is not sufficient enough for appearance subspace learning on complex video sequences. Moreover, as both the low-rank and the column sparse measures are tightly related to all the samples in the sequences, it is challenging to incrementally solve optimization problems with both nuclear norm and column sparse norm on sequentially obtained video data. To address above limitations, this paper develops a novel low-rank subspace learning with adaptive penalization (LSAP) framework for subspace based robust visual tracking. Different from previous work, which often simply decomposes observations as low-rank features and sparse errors, LSAP simultaneously learns the subspace basis, low-rank coefficients and column sparse errors to formulate appearance subspace. Within LSAP framework, we introduce a Hadamard production based regularization to incorporate rich generative/discriminative structure constraints to adaptively penalize the coefficients for subspace learning. It is shown that such adaptive penalization can significantly improve the robustness of LSAP on severely corrupted dataset. To utilize LSAP for online visual tracking, we also develop an efficient incremental optimization scheme for nuclear norm and column sparse norm minimizations. Experiments on 50 challenging video sequences demonstrate that our tracker outperforms other state-of-the-art methods.

Introduction

Visual tracking refers to establish the correspondences of the interesting object in dynamic scene between the successive frames (Chen et al., 2016, Song, 2014, Song et al., 2017, Yang et al., 2016, Zhang, Zhang et al., 2014). As one of the core problems of computer vision, visual tracking has numerous advanced applications such as gesture recognition, video surveillance, medical image analysis and human computer interaction. As heavy occlusion, illumination change, and complex object motion may happen in complex and dynamic scenes, visual tracking is a challenging task. In the past decades, various algorithms have been proposed for visual tracking to solve these difficulties. These existing methods can be roughly divided into two categories: discriminative algorithms (e.g., Avidan, 2004; Babenko, Yang, & Belongie, 2009; Hare, Saffari, & Torr, 2011; Kalal, Mikolajczyk, & Matas, 2012; Zhang, Zhang, & Yang, 0000) and generative algorithms (e.g., Kwon & Lee, 2011; Liu, Jin, Su, & Zhang, 2014; Liu, Bai, Su, Zhang, & Sun, 2015; Mei & Ling, 2009; Zhang, Ghanem, Liu, & Ahuja, 2012a).

The discriminative algorithms formulate visual tracking as a binary classification problem which is to extract discriminative features to distinguish the target from the local background. For example, online boosting methods (Grabner and Bischof, 2006, Grabner et al., 2006) design an online boosting classifier to extract and maintain the best discriminative features for visual tracking. Yang, Lu, and Yang (2014) present a discriminative appearance model based on superpixels and facilitate a tracker to distinguish the target and the background with midlevel cues. However, when the limited data is available, discriminative models cannot perform better than generative models (Ng & Jordan, 2002).

In contrast, the generative algorithms model the appearance of the target and search for the candidate which is most similar to the object. In recent years, subspace, as an efficient geometric, has become the most popular choice in generative visual tracking. For example, incremental visual tracking (IVT) (Ross, Lim, Lin, & Yang, 2008) formulates the target applying principal component analysis (PCA) (Jolliffe, 2002, Liu et al., 2012) subspace. Moreover, its incrementally updated strategy can adapt to the appearance changes efficiently. However, IVT learns and updates the object subspace by using corrupted observations; thus, the brittleness of PCA for partial occlusions may decrease its performance.

Sparse representation has proven to be an efficient method to model subspace structure in vision data (Wright, Yang, Ganesh, Sastry, & Ma, 2009). Motivated by this idea, Mei and Ling (2009) proposed a L1 minimization based formula to model appearance subspace for tracking. Each target candidate is represented as a sparse linear combination of the target and trivial templates. However, the tracker cannot exploit redundant and rich image properties. So sparse prototypes (SP) (Wang, Lu, & Yang, 2013) exploit the advantages of both PCA and sparse representation to model the appearance of the target. Unfortunately, as these methods find the sparsest representation coefficients for each frame individually, they cannot capture the global subspace structure of the target.

It is also well known that under some suitable conditions, the low-rank and sparse decomposition based robust principal component analysis (RPCA) (Candès, Li, Ma, & Wright, 2011) model can also exactly recover a low-rank matrix from corrupted observations to represent the subspace. But unfortunately, the optimization of RPCA requires to solve nuclear norm minimization on the whole data matrix, which leads to high computational complexity. Moreover, in RPCA, the nuclear norm minimization problem has to be recomputed on the whole dataset if new samples arrive. Thus, standard RPCA is not suitable for sequential data analysis problems, such as visual tracking. To address above limitations, Liu, Lin, Su, and Gao (2014) develop a linear time optimization scheme for RPCA. Following this idea, the work in Wang, Liu, and Su (2015) and Zhang, Liu, Qiu, and Su (2014) also proposes an incremental extension of RPCA to learn low-rank features for object tracking. However, as low-rank features cannot be used to explicitly represent the subspace, additional postprocess has to be used on the learned low-rank features to obtain final appearance subspace representation. More importantly, all the above-mentioned low-rank methods are designed for general data. Thus no specific structure constraints for appearance subspace in object tracking problems are considered in these formulations. It can be seen in Fig. 1 that the general designed subspace learning based trackers cannot track the target well in challenging sequences. In addition, the filtering scheme proposed in Liu, Lin et al. (2014) cannot be used for column sparse optimization (e.g., $ℓ_{2, 1}$ norm minimization), which are more suitable for frame specified corruptions in video sequences.

In this paper, we design a novel subspace learning model, called low-rank subspace learning with adaptive penalization (LSAP), to specifically address appearance subspace learning problems for object tracking. Different from general subspace learning methods, which do not consider specific constraints for tracking problems, we introduce an adaptive penalization to incorporate structure constraints of appearance subspace into our model. First, by using sample-subspace distance as a generative penalization, we can successfully remove samples which are far from the previously learned appearance subspace. Second, we train a confidence measure of foreground/background as a discriminative penalization. In this way, LSAP can remove candidate samples in the background region thus build more accurate appearance subspace for the object in the foreground region. To formulate frame specified corruptions in the video sequences, we utilize $ℓ_{2, 1}$ norm rather than $ℓ_{1}$ norm as sparse regularization in our model. However, due to the attribute of $ℓ_{2, 1}$ norm, we cannot use the online updating scheme in Liu, Lin et al. (2014) to solve LSAP. So as a nontrivial byproduct, we develop a new online optimization algorithm to incrementally update LSAP on sequential data. Comparing with the existing methods, the contributions of our work can be summarized as follows:

•
Compared with RPCA, which needs additional postprocess to build explicit subspace representation (e.g., subspace basis) on the learned low-rank features, LSAP can simultaneously learn the subspace basis, low-rank coefficients and column sparse errors by a single optimization model.
•
Most general subspace learning models do not consider any specific structure constraints for the object tracking problem. In contrast, we introduce an adaptive penalization to incorporate rich generative and discriminative information into LSAP model for specific appearance subspace learning. Thus, LSAP can extract correct appearance even when the observations are corrupted by severe corruptions (see Fig. 5, Fig. 6).
•
To address online optimization issues in LSAP, we extend the filtering idea in Liu, Lin et al. (2014) and develop an efficient incremental optimization scheme for $ℓ_{2, 1}$ norm minimization. It should be noticed that our proposed numerical scheme can also be used to solve other $ℓ_{2, 1}$ regularized low-rank learning problems, such as Liu et al. (2013). Table 1 shows that the speed of our method is much faster than standard RPCA.
•
By incorporating a novel convolutional location estimation process into our framework, we successfully provide address.

The remaining of the paper is organized as follows. In Section 2, the most relevant work are reviewed. Section 3 gives a detailed description of the proposed approach and the construction of penalization $W$ . The online optimization scheme of our model is proposed in Section 4 in detail. Section 5 introduces visual applications of LSAP model. Experimental results of visual tracking are reported and analyzed in Section 6. In Section 7, we conclude our work and future work.

Section snippets

Previous work

In the past decades, various subspace learning formulations have been proposed in the subject of visual tracking. In this section, the most relevant algorithms are discussed.

Motivation and formulation

To explicitly represent the subspace, we hope that the subspace basis and low-rank coefficients can be learned simultaneously. By exploiting the incremental extension of RPCA, we develop a model which can learn the subspace basis, low-rank coefficients and sparse errors simultaneously. Furthermore, as our best acknowledgment, RPCA can exactly recover low-rank features in most cases. However, when the samples are corrupted severely (e.g., Fig. 5(b)), RPCA regards the errors as the part of

Online optimization

As LSAP model has the following three limitations, the previous optimization algorithm cannot be directly utilized. First, the solving of nuclear norm minimization requires SVD, which leads to extremely high computational complexity. Second, the constraint with orthogonality is a difficult problem because the constraint is not only non-convex but numerically expensive to preserve during iterations. Third, the column sparse norm is closely related to all samples in the sequence, the online

Facial image analysis

Extracting intrinsic features from facial data is the most important part in the subject of facial analysis. We apply our model on the CMU PIE database (Sim, Baker, & Bsat, 2002) to verify the performance of facial feature extraction. In the experiment, we randomly choose 70 images from each person under different illuminations and expressions and we remove a small part information of each chosen image as shown in Fig. 5(a) and (b). As our best acknowledgment, RPCA (Candès et al., 2011) works

Experiments

To validate the efficiency and effectiveness of LSAP tracker, we provide extensive experimental results. The experimental results are organized as two parts. First, we give an overview of the video datasets that we test our model with $ℓ_{1}$ and $ℓ_{2, 1}$ norms. Then, the overall evaluation is reported. We compare our trackers which use generative/discriminative penalization with/without location estimation with other sixteen state-of-the-art trackers including CNT (Zhang, Liu, Wu, & Yang, 2016), KCF (

Conclusion

This paper presents a low-rank subspace learning model with adaptive penalization, which can extract the low-rank coefficients, sparse errors and subspace basis simultaneously. By penalizing the corresponding coding adaptively, the process of subspace learning can be supervised and a better subspace can be obtained. In addition, we develop a new online optimization scheme for LSAP model with column sparse norm. The experimental results on the facial data indicate that LSAP can exactly recover

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Nos. 61672125, 61300086, 61572096, and 61432003), the Fundamental Research Funds for the Central Universities (DUT15QY15) and the Hong Kong Scholar Program (No. XJ2015008).

References (52)

W. Chen et al.
Robust visual tracking via patch based kernel correlation filters with adaptive multiple feature ensemble
Neurocomputing
(2016)
R. Liu et al.
Linear time principal component pursuit and its extensions using $ℓ_{1}$ filtering
Neurocomputing
(2014)
J. Yang et al.
Robust object tracking by online Fisher discrimination boosting feature selection
Computer Vision and Image Understanding
(2016)
C. Zhang et al.
Robust visual tracking via incremental low-rank features learning
Neurocomputing
(2014)
R. Achanta et al.
Slic superpixels compared to state-of-the-art superpixel methods
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2012)
S. Avidan
Support vector tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2004)
Babenko, B., Yang, M.-H., & Belongie, S. (2009). Visual Tracking with Online Multiple Instance Learning. In...
Bao, C., Wu, Y., Ling, H., & Ji, H. (2012). Real time robust ℓ1 tracker using accelerated proximal gradient approach....
E.J. Candès et al.
Robust principal component analysis ?
Journal of the ACM
(2011)
M. Everingham et al.
The pascal visual object classes (VOC) challenge
International Journal of Computer Vision
(2010)

Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In...

Grabner, H., Grabner, M., & Bischof, H. (2006). Real-time tracking via on-line boosting. In BMVC, Vol. 1 (p....

Hare, S., Saffari, A., & Torr, P. (2011). Struck: Structured output tracking with kernels. In ICCV (pp....

J.F. Henriques et al.

High-speed tracking with kernelized correlation filters

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2015)

Isard, M., & Blake, A. (1998). Condensation-conditional density propagation for visual tracking. In IJCV, Vol. 29 (pp....

Jia, X., Lu, H., & Yang, M.-H. (2012). Visual tracking via adaptive structural local sparse appearance model. In CVPR...

I. Jolliffe

Principal component analysis

(2002)

Z. Kalal et al.

Tracking-learning-detection

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2012)

Kwon, J., & Lee, K. (2011). Tracking by sampling trackers. In ICCV (pp....

Y. Li et al.

Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2008)

Z. Lin et al.

Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning

Machine Learning

(2013)

Lin, Z., Liu, R., & Su, Z. (2011). Linearized alternating direction method with adaptive penalty for low-rank...

R. Liu et al.

Robust visual tracking via l0 regularized local low-rank feature learning

Journal of Electronic Imaging

(2015)

R. Liu et al.

Latent subspace projection pursuit with online optimization for robust visual tracking

IEEE MultiMedia

(2014)

Liu, R., Lin, Z., la Torre, F.D., & Su, Z. (2012). Fixed-rank representation for unsupervised visual learning. In...

G. Liu et al.

Robust recovery of subspace structures by low-rank representation

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2013)

Cited by (19)

Distributed time-varying optimization control protocol for multi-agent systems via finite-time consensus approach
2024, Neural Networks
This paper addresses a distributed time-varying optimization problem with inequality constraints based on multi-agent systems over switching communication graphs. To reduce the influence of time-varying inequality constraints, an exact penalty method and smoothing technique are employed. Then, a Hessian-based distributed control protocol is presented to seek the time-varying optimal solution of the distributed time-varying optimization problem by virtue of only local information and interaction. It is shown that all agents not only achieve finite-time consensus but also track the time-varying global optimal target eventually. Compared with the existing distributed optimization protocols, the proposed control protocol is suitable for more general distributed time-varying optimization problems and enjoys high-efficiency convergence. Finally, numerical examples and experiment on moving target tracking of Unmanned Aircraft Vehicle (UAV) are performed to illustrate the effectiveness of the proposed control protocol.
Enhanced robust spatial feature selection and correlation filter learning for UAV tracking
2023, Neural Networks
Spatial boundary effect can significantly reduce the performance of a learned discriminative correlation filter (DCF) model. A commonly used method to relieve this effect is to extract appearance features from a wider region of a target. However, this way would introduce unexpected features from background pixels and noises, which will lead to a decrease of the filter’s discrimination power. To address this shortcoming, this paper proposes an innovative method called enhanced robust spatial feature selection and correlation filter Learning (EFSCF), which performs jointly sparse feature learning to handle boundary effects effectively while suppressing the influence of background pixels and noises. Unlike the $ℓ_{2}$ -norm-based tracking approaches that are prone to non-Gaussian noises, the proposed method imposes the $ℓ_{2, 1}$ -norm on the loss term to enhance the robustness against the training outliers. To enhance the discrimination further, a jointly sparse feature selection scheme based on the $ℓ_{2, 1}$ -norm is designed to regularize the filter in rows and columns simultaneously. To the best of the authors’ knowledge, this has been the first work exploring the structural sparsity in rows and columns of a learned filter simultaneously. The proposed model can be efficiently solved by an alternating direction multiplier method. The proposed EFSCF is verified by experiments on four challenging unmanned aerial vehicle datasets under severe noise and appearance changes, and the results show that the proposed method can achieve better tracking performance than the state-of-the-art trackers.
Intelligent Trigonometric Particle Filter for visual tracking
2022, ISA Transactions
Visual tracking is one of the pre-eminent tasks in several computer vision applications. Particle filter (PF) is extensively used in visual tracking for intelligent surveillance system applications, hugely significant. But the re-sampling procedure of PF will result in sample impoverishment, which will affect the precision of tracking simultaneously. In this paper, a new tracking technique, called Trigonometric Particle Filter (TPF), based on PF optimized by Sine Cosine Algorithm (SCA), which contains trigonometric sine and cosine functions, is proposed. An enhanced method for improving the number of target particles used in a Sine Cosine Algorithm for trigonometric particle filter includes SCA ahead of the re-sampling step. This step ensures a more extensive particle set Achievement of the proposed TPF tracker is inspected and assessed on Visual Tracker Benchmark (VOT) databases. The proposed TPF tracker is compared with evolutionary-based methods like the Spider monkey optimization assisted PF (SMO-PF), Firefly algorithm-based PF (FAPF) method, Particle swarm optimization-based PF (PSO-PF) and Particle filter, recent four correlation filter-based trackers, and also with other ten state-of-the-art tracking methods. We demonstrate that visual tracking using TPF delivers additional consistent and proficient tracking outcomes than compared trackers.
Efficient joint model learning, segmentation and model updating for visual tracking
2022, Neural Networks
Citation Excerpt :
It also integrates a correlation filter-based global appearance model into the framework to better use the global target structure. Liu, Wang, Han, Fan, and Luo (2017) build both a global appearance model based on adaptive low-rank subspace learning and a discriminative superpixel model to obtain the tracking result. In this section, we elaborate the proposed unified optimization framework described by Eq. (2).
The Tracking-by-segmentation framework is widely used in visual tracking to handle severe appearance change such as deformation and occlusion. Tracking-by-segmentation methods first segment the target object from the background, then use the segmentation result to estimate the target state. In existing methods, target segmentation is formulated as a superpixel labeling problem constrained by a target likelihood constraint, a spatial smoothness constraint and a temporal consistency constraint. The target likelihood is calculated by a discriminative part model trained independently from the superpixel labeling framework and updated online using historical tracking results as pseudo-labels. Due to the lack of spatial and temporal constraints and inaccurate pseudo-labels, the discriminative model is unreliable and may lead to tracking failure. This paper addresses the aforementioned problems by integrating the objective function of model training into the target segmentation optimization framework. Thus, during the optimization process, the discriminative model can be constrained by spatial and temporal constraints and provides more accurate target likelihoods for part labeling, and the results produce more reliable pseudo-labels for model learning. Moreover, we also propose a supervision switch mechanism to detect erroneous pseudo-labels caused by a severe change in data distribution and switch the classifier to a semi-supervised setting in such a case. Evaluation results on OTB2013, OTB2015 and TC-128 benchmarks demonstrate the effectiveness of the proposed tracking algorithm.
Adaptive ensemble perception tracking
2021, Neural Networks
Citation Excerpt :
Given the state of a target of interest in the first frame, which is usually represented by a bounding box, visual tracking aims to estimate the target states in the subsequent frames. Despite the progress in recent years (Dai, Wang, Lu, Sun, & Li, 2019; Li, Yan, et al., 2018; Liu, Wang, Han, Fan, & Luo, 2017; Yasukawa, Okuno, Ishii, & Yagi, 2016), visual tracking suffers from numerous challenges, including scale variation, aspect ratio variation, deformation, and rotation. Recently, the Siamese-based trackers (Bertinetto, Valmadre, Henriques, Vedaldi, & Torr, 2016; Tao, Gavves, & Smeulders, 2016), which cast tracking as a matching problem and learn a general similarity metric by off-line training, have drawn considerable attention in the tracking community due to their promising performance.
Recently, tracking models based on bounding box regression (such as region proposal networks), built on the Siamese network, have attracted much attention. Despite their promising performance, these trackers are less effective in perceiving the target information in the following two aspects. First, existing regression models cannot take a global view of a large-scale target since the effective receptive field of a neuron is too small to cover the target with a large scale. Second, the neurons with a fixed receptive field (RF) size in these models cannot adapt to the scale and aspect ratio changes of the target. In this paper, we propose an adaptive ensemble perception tracking framework to address these issues. Specifically, we first construct a per-pixel prediction model, which predicts the target state at each pixel of the correlated feature. On top of the per-pixel prediction model, we then develop a confidence-guided ensemble prediction mechanism. The ensemble mechanism adaptively fuses the predictions of multiple pixels with the guidance of confidence maps, which enlarges the perception range and enhances the adaptive perception ability at the object-level. In addition, we introduce a receptive field adaption model to enhance the adaptive perception ability at the neuron-level, which adjusts the RF by adaptively integrating the features with different RFs. Extensive experimental results on the VOT2018, VOT2016, UAV123, LaSOT, and TC128 datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy and speed.
Learning dual-margin model for visual tracking
2021, Neural Networks
Existing trackers usually exploit robust features or online updating mechanisms to deal with target variations which is a key challenge in visual tracking. However, the features being robust to variations remain little spatial information, and existing online updating methods are prone to overfitting. In this paper, we propose a dual-margin model for robust and accurate visual tracking. The dual-margin model comprises an intra-object margin between different target appearances and an inter-object margin between the target and the background. The proposed method is able to not only distinguish the target from the background but also perceive the target changes, which tracks target appearance changing and facilitates accurate target state estimation. In addition, to exploit rich off-line video data and learn general rules of target appearance variations, we train the dual-margin model on a large off-line video dataset. We perform tracking under a Siamese framework using the constructed appearance set as templates. The proposed method achieves accurate and robust tracking performance on five public datasets while running in real-time. The favorable performance against the state-of-the-art methods demonstrates the effectiveness of the proposed algorithm.

View all citing articles on Scopus

View full text

Adaptive low-rank subspace learning with online optimization for robust visual tracking

Abstract

Introduction

Section snippets

Previous work

Motivation and formulation

Online optimization

Facial image analysis

Experiments

Conclusion

Acknowledgments

Neurocomputing

Neurocomputing

Computer Vision and Image Understanding

Neurocomputing

Slic superpixels compared to state-of-the-art superpixel methods

IEEE Transactions on Pattern Analysis and Machine Intelligence

Support vector tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

Robust principal component analysis ?

Journal of the ACM

The pascal visual object classes (VOC) challenge

International Journal of Computer Vision

High-speed tracking with kernelized correlation filters

IEEE Transactions on Pattern Analysis and Machine Intelligence

Principal component analysis

Tracking-learning-detection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans

IEEE Transactions on Pattern Analysis and Machine Intelligence

Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning

Machine Learning

Robust visual tracking via l0 regularized local low-rank feature learning

Journal of Electronic Imaging

Latent subspace projection pursuit with online optimization for robust visual tracking

IEEE MultiMedia

Robust recovery of subspace structures by low-rank representation

IEEE Transactions on Pattern Analysis and Machine Intelligence