Online Fast Deep Learning Tracker Based on Deep Sparse Neural Networks

Wang, Xin; Hou, Zhiqiang; Yu, Wangsheng; Jin, Zefenfen

doi:10.1007/978-3-319-71607-7_17

Xin Wang ORCID: orcid.org/0000-0003-0950-5542¹⁶,
Zhiqiang Hou¹⁶,
Wangsheng Yu¹⁶ &
…
Zefenfen Jin¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10666))

Included in the following conference series:

International Conference on Image and Graphics

2575 Accesses

Abstract

Deep learning can explore robust and powerful feature representations from data and has gained significant attention in visual tracking tasks. However, due to its high computational complexity and time-consuming training process, the most existing deep learning based trackers require an offline pre-training process on a large scale dataset, and have low tracking speeds. Therefore, aiming at these difficulties of the deep learning based trackers, we propose an online deep learning tracker based on Sparse Auto-Encoders (SAE) and Rectifier Linear Unit (ReLU). Combined ReLU with SAE, the deep neural networks (DNNs) obtain the sparsity similar to the DNNs with offline pre-training. The inherent sparsity make the deep model get rid of the complex pre-training process and can be used for online-only tracking well. Meanwhile, the technique of data augmentation is employed in the single positive sample to balance the quantities of positive and negative samples, which improve the stability of the model to some extent. Finally, in order to overcome the problem of randomness and drift of particle filter, we adopt a local dense sampling searching method to generate a local confidence map to locate the target’s position. Moreover, several corresponding update strategies are proposed to improve the robustness of the proposed tracker. Extensive experimental results show the effectiveness and robustness of the proposed tracker in challenging environment against state-of-the-art methods. Not only the proposed tracker leaves out the complicated and time-consuming pre-training process efficiently, but achieves an online fast and robust tracking.

You have full access to this open access chapter, Download conference paper PDF

Robust feature learning for online discriminative tracking without large-scale pre-training

Article 30 June 2018

Deep learning of spatio-temporal information for visual tracking

Article 05 March 2022

Target re-aware deep tracking based on correlation filters updated online

Article 15 May 2021

Keywords

1 Introduction

Visual tracking technology is one of the hot research directions in computer vision field, which is widely used in military and civil fields [1, 2].

In recent years, a large number of tracking algorithms have been proposed. Existing tracking algorithms can be divided into two categories [3]: generative methods and discriminative methods. The generative methods, e.g., IVT [4] (incremental visual tracking), L1T [5] (the l ₁ tracker), and MTT [6] (multitask tracking), establish the appearance model of the target, and then search the most similar candidate samples as current tracking result. The discriminative methods treat the tracking problem as a binary classification of the target and background. Some popular discriminative trackers include MIL [7] (multiple instance learning), TLD [8] (tracking-learning-detection), and Struck [9]. Although the above trackers have achieved good results under simple controlled conditions, these trackers based on hand-crafted features are still facing enormous challenges in complex environments, e.g., illumination variation, severe occlusion, and background clutters.

Due to Deep Neural Networks (DNNs) [10] can exploit robust and powerful feature representations automatically using its deep structure, the deep learning based tracking algorithms have gained significant attention in visual tracking tasks. Combined offline pre-training with online fine-tuning, Wang and Yeung [11] first applied the stacked denoising auto-encoders (SDAE) architecture to the visual tracking tasks, and achieved a robust tracking performance in some complicated scenarios. Li et al. [12] applied a single-CNN (Convolutional Neural Network) on visual tracking, and combined with multiple image cues to improve the tracking success rate. In [13], Zhang et al. propose a CNT tracker, which take the advantage of local structure feature and global geometric information to improve the tracking performance. Ma et al. [14] utilized hierarchical features with CNNs and gained a state-of-the-art result in complicated tracking situations. With the fast development of deep learning, the trackers based on deep learning outperform the traditional tracking algorithms significantly in tracking success rate and accuracy.

However, there are still several difficulties of the deep learning based trackers that are desired to be solved. (i) A complex and time-consuming offline pre-training process is indispensable to most existing deep learning based trackers. The offline pre-training process requires an auxiliary large scale dataset and the learned generic representations from the auxiliary dataset may not be suitable to track a specific object. (ii) The traditional nonlinear activation functions like sigmoid or tanh have complex mathematical expressions. It results in high computational complexity in error back propagation (BP) during the training of deep networks and will reduce the tracking speed. (iii) The trackers like DLT or CNT use the particle filter to obtain the candidate samples. The bad particles will affect the tracking performance and easily cause the tracking drift. Meanwhile, the randomness of the particles will result in the inconsistency of the tracking results in the repeat experimentations.

In this work, we propose an online fast deep learning tracker to solve the above problems. The main contributions of our works can be summarized as follows:

(1)
We adopt Rectifier Linear Unit (ReLU) as the activation function of Sparse Auto-Encoders (SAE) and build a simple yet effective Deep Sparse Neural Network (DSNN) for tracking. The ReLU and sparsity constraint make DSNN highly sparse and get rid of the complex pre-training process. It makes the proposed tracker achieve an online-only training and tracking. Meanwhile, the simple mathematical expression reduces the computational complexity in training and improves the tracking speed.
(2)
In order to overcome the problem of randomness and drift of particle filter, we adopt a local dense sampling searching method to generate a local confidence map. By searching the maximum confidence value, the current position of target is located accurately. In addition, in order to balance the quantities of positive and negative samples, a technique of data augmentation is employed for the single positive sample.
(3)
We present an online adaptive model update strategy aiming at the long-term tracking tasks. By establishing a sliding time window and adaptively adjusting the local searching area, the update strategy improves the robustness of the proposed tracker in challenging environment.

Extensive experimental results on OTB2013 [15] show that the proposed tracker is effective and efficient in challenging environment against state-of-the-art methods. Not only the proposed tracker leaves out the complicated and time-consuming pre-training process efficiently, but achieves an online fast and robust tracking.

2 Deep Sparse Neural Network for Tracking

The sparsity of neural networks means that the features of the input layer are represented by the least hidden neurons. It is actually to look for a set of “overcomplete” basis vectors to represent the data efficiently and has better sparsity and expressiveness.

2.1 Sparse Auto-Encoders with ReLU

Sparse Auto-Encoder (SAE) [16] is an unsupervised learning model, which is one basic algorithm in deep learning. By using the “Layer-by-Layer Greedy Algorithm” to stack multiple SAEs, we obtain a deep sparse networks. Figure 1(a) shows the basic structure of stacked-SAEs. Let $ \hat{x}_{i} $ denote the reconstruction of the input data $ x_{i} $, $ \varvec{W} $ and $ \varvec{W}^{{\prime }} $ denote the weight matrix of encoder and decoder respectively, and $ \varvec{b} $ denote the bias vector of encoder. In our work, the loss function of the stacked-SAEs is defined as:

$$ \varvec{L}\left( {\varvec{W,}\,\varvec{b}} \right) = \sum\limits_{{\varvec{i} = 1}}^{\varvec{m}} {\left\| {\varvec{x}_{\varvec{i}} - \hat{\varvec{x}}_{\varvec{i}} } \right\|_{\text{2}}^{\text{2}} } + \lambda \left( {\left\| \varvec{W} \right\|_{F}^{\text{2}} + \left\| {\varvec{W}^{{\prime }} } \right\|_{\mathcal{F}}^{2} } \right) + \mu H\left( {\left. \rho \right|\left| {\hat{\varvec{\rho }}} \right.} \right) $$

(1)

where m is the number of samples, $ \lambda $ is a penalty factor which balances the reconstruction loss and weights, $ \mu $ is the sparsity penalty factor, and $ \left\| \cdot \right\|_{F} $ denotes the Frobenius norm. The cross-entropy $ H\left( {\left. \rho \right|\left| {\hat{\varvec{\rho }}} \right.} \right) $ is given as:

$$ H\left( {\left. \rho \right|\left| {\hat{\varvec{\rho }}} \right.} \right) = - \sum\limits_{j = 1}^{n} {\left[ {\rho log\left( {\hat{\rho }_{j} } \right)} \right.} \left. { + \left( {\text{1} - \rho } \right)log\left( {\text{1} - \hat{\rho }_{j} } \right)} \right] $$

(2)

$$ \hat{\rho }_{j} = \frac{1}{m}\sum\limits_{i = 1}^{m} {\left[ {h_{j} \text{(}x_{i} \text{)}} \right]} $$

(3)

where k and n are the number of neurons in input and hidden layer respectively. $ h_{j} \text{(}x_{i} \text{)} $ denotes the activation value in the j ^th hidden layer to the input $ x_{i} $. The sparsity target $ \varvec{\rho} $ is close to 0, and it is set to 0.05 in our experiments.

In order to obtain the robust and powerful capacity of extracting features, the offline pre-training on a large scale dataset is usually used in deep networks. The key of pre-training is to obtain the sparse distributed representation of deep networks [17]. Rectifier Linear Unit (ReLU) [18, 19] is a sparse activation function. As shown in Fig. 1(b), the rectifier function ReLU(x) = max(0, x) is a one-side activation function, which enforces hard zeros in the learned feature representation and leads to the sparsity of hidden units. So we adopt ReLU as an activation function to the aforementioned stacked-SAEs to improve the sparsity of the DNN. The variant of stacked-SAEs with ReLU is shown in Fig. 1(c).

It is proven in [19, 20] that ReLU will bring the inherent sparsity to DNNs, which let the pre-training become less effective for DNNs with the activation function of ReLU. So the usage of ReLU as activation function leaves out the offline pre-training process of DNN. It will solve the over-fitting problem in pre-training well. Meanwhile, the unilateral activation side of ReLU is an unsaturated linear function, which effectively solves the problem of gradient vanishing in the training process. Moreover, since the gradient of ReLU is the fixed value of 1 or 0, it isn’t necessary to perform complex gradient calculation in the network training. This reduces the computational complexity and improves the training speed effectively.

2.2 Online Tracking Network

In order to achieve the purpose of tracking, we add a softmax classifier as the last layer to the stacked-SAEs to classify learned features. The logistic regression is included in the softmax classifier:

$${l}_{\theta } (\varvec{t}) = \frac{\text{1}}{{\text{1}{ + e}^{{ - \theta^{T} \varvec{t}}} }} $$

(4)

where ${l}_{\theta } (\varvec{x}) $ is a value in [0, 1], i.e. represents the probability of the sample t as the true target; $ \theta $ is the model parameters. The final model of deep sparse neural networks for tracking is shown in Fig. 2.

3 Proposed Tracking Algorithm

Based on the aforementioned deep sparse neural networks, we propose an online fast deep learning tracker. In this Section, we will describe our proposed tracking algorithm in detail.

3.1 Initialization of the Tracking Network

Given the initial state $ s_{\text{0}} = \{ x_{\text{0}} ,y_{\text{0}} ,w_{\text{0}} ,h_{\text{0}} \} $ of target, we can obtain a single positive sample patch by sampling at the initial frame, where $ \text{(}x_{\text{0}} ,y_{\text{0}} \text{)} $ denotes the initial position, $ w_{\text{0}} $ and $ h_{\text{0}} $ denote the initial width and height, respectively. Meanwhile, we also obtain 100 negative sample patches by random sampling around $ \text{(}x_{\text{0}} ,y_{\text{0}} \text{)} $. Normalizing all patches, we can get the standard gray-scale images of 32 × 32 pixels as the input data for the tracking network.

Meanwhile, considering the imbalance between positive and negative will affect the robustness of the tracking network, so we need to augment the quantity of positive samples to balance the quantity of positive and negative samples. A method of sampling within 2 pixels near the positive sample to data augmentation was proposed in [11]. However, this method is prone to accumulate the error and affect the tracking results. In [21], new data was created by transforming the images such as scaling, translation, rotation, noising, changing brightness, mirroring and cropping to expand the quantity of samples. We extend the single positive sample in initial frame to 10 samples by changing the brightness, contrast, noise, and smoothing and mirroring. The results are shown in Fig. 3.

Using these 10 positive samples and 100 negative samples as the label data, we can get the tracking network parameters corresponding to the specific task by training the tracking network of Fig. 2.

3.2 Local Confidence Maps

During the tracking process, each sample patch can get a value into [0, 1] through the softmax classifier in the tracking network. The value reflects the probability that the sample patch is a positive sample (i.e. the target), and we call it “confidence value” of the sample patch. In our proposed algorithm, we use local dense sampling method to sample all the pixels in the candidate area as the sampling center. Sending all the sample patches to the tracking network, we can get the confidence value of all the pixels in the candidate area. As is shown in Fig. 4, the local confidence map of the candidate area can be obtained by visualizing all the confidence value, which can intuitively reflect the possible position of the target in local area.

According to Eq. (5), the sample patch with the highest confidence is determined as the tracking result in current frame.

$$ s_{t} = arg\;max\left( {\varsigma_{i} } \right) $$

(5)

where $ \varsigma_{i} $ denotes the confidence value of the i th sample patch, $ s_{t} = \{ x_{t} ,y_{t} ,w_{t} ,h_{t} \} $, i.e. the target state in frame t.

In addition, we add a random disturbance $ (w_{r} ,h_{r} ) $ to the size $ (w_{i} ,h_{i} ) $ of the sample patch to accommodate the scale change of the target during tracking. In this paper, both $ w_{r} $ and $ h_{r} $ follow a normal distribution with mean of 0 and variance of 0.1.

3.3 Online Adaptive Model Update

In the long-term tracking, the target is susceptible to the illumination variation, deformation, background clutter and so on, and it is easy to cause tracking drifting. At this time, the tracking network parameters need to be updated. The update criteria of tracking network are as follows:

$$ max\text{(}\varsigma_{i} \text{)} < \tau_{1} \,||\,fn \ge \eta $$

(6)

where $ \tau_{1} $ is the threshold of network update, $ fn $ is the number of cumulative frames after the last update, and $ \eta $ is the maximum of cumulative frames.

The update strategy is to establish a sliding time window of positive samples [22] and put the tracking results of current frame and its adjacent 9 frames into the sliding window, which is shown in Fig. 5. And the positive samples in the sliding window are replaced and updated in real time. When Eq. (6) is satisfied, we resample 100 negative samples in current frame, and take them together with 10 positive samples of the initial frame and 10 positive samples of the sliding time window as the label data to train the tracking network and update the network parameters.

Meanwhile, the initial local searching area may not detect the correct target when the target is occluded, so the searching area is needed to expand that the target can be tracked correctly. The update criteria of searching area are as follows:

$$ max\text{(}\varsigma_{i} \text{)} < \tau_{\text{2}} $$

(7)

where $ \tau_{\text{2}} $ is the threshold of searching area updating.

The searching area is updated as follows:

$$ N = N + \delta $$

(8)

where N is the length of square searching area, and the initial N is set to 10 pixels. $ \delta $ is the increment of N.

3.4 Overall Process of Proposed Algorithm

We present the main steps of the proposed tracking algorithm in Table 1. The flow chart as shown in Fig. 6.

Table 1. The main steps of DLST algorithm.

Full size table

4 Experiments

The proposed tracking algorithm is realized in MATLAB under the experimental platform of CPU (Intel Xeon 2.4 GHz) and GPU (TITAN X). We empirically compare our tracker with some state-of-the-art trackers using the OTB2013 benchmark dataset [15], which includes 51 fully-annotated sequences. These trackers are: SST [23], SCM [24], Struck [9], DLT [11], LLC [25], CN [26], MIL [7], and NRMLC [27]. The results of these trackers are provided by their authors.

The setting of experimental parameters of our tracker are as follows: $ \lambda = \text{0.005} $, $ \mu = \text{0.2} $, $ \eta = \text{50} $, $ \tau_{1} = \text{0.9} $, $ \tau_{2} = \text{0.5} $, $ \delta = \text{5} $. In experiments, we use the OPE evaluate method of and the evaluation indicators mentioned in [15].

4.1 Qualitative Comparison

We use all 51 sequences of OTB2013 in our experiments. Some tracking results of the 9 challenging sequences are shown in Fig. 7. Then we analyse the performance in the following different scenarios:

(1)
Illumination variation: There are severe illumination changing in “Car4”, “Singer2”, and “Trellis”. Compared with other trackers, the proposed tracker tracks the targets more accurately. And in “Car4”, our tracker can better adapt to the scale changing of target along the whole sequence.
(2)
Occlusion and Rotation: The targets are partially or completely occluded in “Suv” and “Tiger2”. Our tracker always tracks the target continuously from beginning to end. In “Fleetface” and “Tiger2”, out-of-plane or in-plane rotation increase the difficulty of tracking, yet our tracker can still provide accurate results relatively.
(3)
Fast motion and Motion blur: In “Boy” and “Basketball”, the motion of target is very fast and even causes the motion blur. The proposed tracker has the capacity to track the target more reliably and accurately than others.
(4)
Deformation and Background clutter: There are deformation and similar background to target in “Basketball” and “Freeman4”. This is a challenge to the robustness of the features extracted by trackers. From the tracking results, our tracker explores more robust and powerful features to track the correct target stably.

4.2 Quantitative Comparison

For quantitative comparison, the precision plots and success plots of these trackers for all 51 sequences on OTB2013 are given respectively in Fig. 8. Our tracker ranks 1^st for both plots and outperforms these state-of-the-art trackers in overall performance. For precision plots, our tracker achieves 0.660 which is higher than DLT (the similar deep learning based tracker) by 12.4%. For success plots, our tracker achieves 0.501 which is improved by 14.9% over DLT tracker.

Tables 2 and 3 show the precision values and success rates of 9 trackers on 11 different attributes, respectively. In both tables, these abbreviations represent different attributes which are defined in [15]: IV-Illumination Variation, SV-Scale Variation, OCC-Occlusion, BC-Background Clutters, DEF-Deformation, MB-Motion Blur, FM-Fast Motion, IPR-In Plane Rotation, OPR-Out of Plane Rotation, OV-Out of View, LR-Low Resolution. The number below the abbreviation represents the quantity of sequences within this attribute in OTB2013. The best results are in red and the second best in green. From Tables 2 and 3, we observe that our tracker ranks the optimal or suboptimal results on 8 attributes. Only on two attributes of BC and LR, our tracker doesn’t rank the top 3. These data show that our tracker has a favorable performance on different challenging environments against the contrast trackers.

Table 2. The precision values of 9 trackers on 11 attributes.

Full size table

Table 3. The success rates of 9 trackers on 11 attributes.

Full size table

4.3 Tracking Speed Comparison

FPS (frames per second) measures the tracking speed and represents the time complexity of the tracker. Table 4 show the tracking speed of 9 trackers. From that, we find that our proposed tracker achieves average 16.5 FPS in our experimental environment. It is faster than DLT and other similar deep learning based trackers like DeepTrack (2.5 FPS) [28].

Table 4. The tracking speed comparison for the 9 trackers.

Full size table

5 Conclusions

In this paper, we propose a robust and fast visual tracking algorithm based on deep sparse neural networks. Combined ReLU with stacked-SAEs, the deep sparse network avoids the complex and time-consuming pre-training, and realizes online-only training and tracking. Data augmentation of single positive sample relieves the imbalance between positive and negative samples, which improves the reliability of deep networks. Meanwhile, the local dense searching method and adaptive update strategy solve the problem of particle drift and randomness. A lot of experimental results on OTB2013 dataset show that our proposed algorithm achieves state-of-the-art results in complicated environment and realize a practical tracking speed.

However, there are still several possible research directions to improve our algorithm. For example, it is not robust enough for our tracker when the target’s scale changes significantly or the complete occlusion sustains too long time. Therefore, the problem of scale adaptability and long-time occlusion will be the focus of our future work.

References

Smeulders, A.W.M., Chu, D.M., Cucchiara, R., et al.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Article Google Scholar
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006)
Article Google Scholar
Li, X., Hu, W.M., Shen, C.H., et al.: A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4(4), Article 58 (2013)
Google Scholar
Ross, D.A., Lim, J., Lin, R.S.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)
Article Google Scholar
Mei, X., Ling, H.: Robust visual tracking using l1 minimization. In: IEEE International Conference on Computer Vision, pp. 1436–1443. IEEE, Washington, D.C. (2009)
Google Scholar
Zhang, T.Z., Ghanem, B., Liu, S., et al.: Robust visual tracking via multi-task sparse learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2042–2049. IEEE, Washington, D.C. (2012)
Google Scholar
Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)
Article Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Article Google Scholar
Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision, pp. 263–270. IEEE, Washington, D.C. (2011)
Google Scholar
Lecun, Y., Bengo, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Wang, N.Y., Yeung, D.: Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp. 809–817. IMLS, Nevada (2013)
Google Scholar
Li, H., Li, Y., Porikli, F.: Robust online visual tracking with a single convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 194–209. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_13
Google Scholar
Zhang, K.H., Liu, Q.S., Wu, Y., et al.: Robust visual tracking via convolutional networks. IEEE Trans. Image Process. 25(4), 1779–1792 (2015)
MathSciNet Google Scholar
Ma, C., Huang, J.B., Yang, X.K., et al.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082. IEEE, Washington, D.C. (2015)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Wang, X., Hou, Z., Yu, W., et al.: Robust visual tracking via multiscale deep sparse networks. Opt. Eng. 56(4), 043107 (2017)
Article Google Scholar
Arpit, D., Zhou, Y., Ngo, H., et al.: Why regularized auto-encoders learn sparse representation? In: International Conference on Machine Learning, pp. 134–144. IMLS, Nevada (2015)
Google Scholar
Nair, V., Hinton, G.,: Rectified linear units improve restricted Boltzmann machines. In: International Conference on Machine Learning, pp. 807–814. IMLS, Nevada (2010)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323. Microtome, Brookline (2011)
Google Scholar
Li, J., Zhang, T., Luo, W., et al.: Sparseness analysis in the pretraining of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2016)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using scale deep network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2366–2374. IEEE, Washington, D.C. (2014)
Google Scholar
Gao, C., Chen, F., Yu, J.G., et al.: Robust visual tracking using exemplar-based detectors. IEEE Trans. Circ. Syst. Video Technol. 27(2), 300–312 (2016)
Article Google Scholar
Zhang, T.Z., Liu, S., Xu, C.S., et al.: Structural sparse tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 150–158. IEEE, Washington, D.C. (2015)
Google Scholar
Zhong, W., Lu, H., Yang, M.H.: Robust object tracking via sparsity-based collaborative model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1838–1845. IEEE, Washington, D.C. (2012)
Google Scholar
Wang, G.F., Qin, X.Y., Zhong, F., et al.: Visual tracking via sparse and local linear coding. IEEE Trans. Image Process. 24(11), 3796–3809 (2015)
Article MathSciNet Google Scholar
Danelljan, M., Khan, F.S., Felsberg, M., et al.: Adaptive color attributes for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097. IEEE, Washington, D.C. (2014)
Google Scholar
Liu, F., Zhou, T., Yang, J., et al.: Visual tracking via nonnegative regularization multiple locality coding. In: IEEE International Conference on Computer Vision Workshop, pp. 912–920. IEEE, Washington, D.C. (2016)
Google Scholar
Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: British Machine Vision Conference, pp. 1–12 (2014)
Google Scholar

Download references

Acknowledgments

This research has been supported by the National Natural Science Foundation of China (No. 61473309) and the Natural Science Foundation of Shaanxi Province (No. 2016JM6050).

Author information

Authors and Affiliations

Information and Navigation College, Air Force Engineering University, Xi’an, 710077, China
Xin Wang, Zhiqiang Hou, Wangsheng Yu & Zefenfen Jin

Authors

Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Wangsheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zefenfen Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Hou, Z., Yu, W., Jin, Z. (2017). Online Fast Deep Learning Tracker Based on Deep Sparse Neural Networks. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-71607-7_17
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Online Fast Deep Learning Tracker Based on Deep Sparse Neural Networks

Abstract

Similar content being viewed by others

Robust feature learning for online discriminative tracking without large-scale pre-training

Deep learning of spatio-temporal information for visual tracking

Target re-aware deep tracking based on correlation filters updated online

Keywords

1 Introduction

2 Deep Sparse Neural Network for Tracking

2.1 Sparse Auto-Encoders with ReLU

2.2 Online Tracking Network

3 Proposed Tracking Algorithm

3.1 Initialization of the Tracking Network

3.2 Local Confidence Maps

3.3 Online Adaptive Model Update

3.4 Overall Process of Proposed Algorithm

4 Experiments

4.1 Qualitative Comparison

4.2 Quantitative Comparison

4.3 Tracking Speed Comparison

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation