Self-Paced Densely Connected Convolutional Neural Network for Visual Tracking

Ge, Daohui; Song, Jianfeng; Qi, Yutao; Wang, Chongxiao; Miao, Qiguang

doi:10.1007/978-3-030-03341-5_9

Daohui Ge^20,21,
Jianfeng Song^20,21,
Yutao Qi^20,21,
Chongxiao Wang^20,21 &
…
Qiguang Miao^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11259))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2276 Accesses

Abstract

Convolutional neural networks (CNNs) have achieved surprising results in visual tracking. To address the model drift problem, we propose a novel self-paced densely connected convolutional neural netwrok (SPDCT) to distinguish the reliable data from noisy and confusing data. In the proposed model, each sample is given a weight, which is estimated by SPDCT to indicate the reliability of the sample. The self-paced learning framework is then integrated into the online update phase to improve the robustness of CNNs. In order to determine the pace parameter of self-paced learning effectively, we propose an adaptive method based on the number of training samples. Meanwhile, with the aim of facilitating the representation power of the features, we enhance the feature reuse and the information flow by applying the densely connected learning. Extensive experimental results demonstrate competing performance of the proposed tracker over a number of state-of-the-art algorithms.

You have full access to this open access chapter, Download conference paper PDF

Online Tracking with Convolutional Neural Networks

Robust Online Visual Tracking with a Single Convolutional Neural Network

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Keywords

1 Introduction

Visual Tracking is a fundamental problem in computer vision, which has been widely applied into video surveillance, robotic, medical imaging, and so on. Given the initial bounding box of the target, the process of the visual tracking is to estimate the location and scale of the target in the subsequent frames. Although visual tracking has been researched for several years, it remains an extremely challenging problem due to appearance changes, partial occlusion, motion blur, and background clutters. CNN-based trackers have been drawing increasing attention and have achieved excellent results in visual tracking.

Most existing trackers adopt an online update strategy to capture the appearance changes. FCNT [19] only updates the specific network using the most confident tracking result to avoid introducing background noise. CREST [15] collects all estimated locations to update the model every fixed frame. The process of updating brings the model drift problem due to factors such as tracking failure, occlusions, and inaccurate scale estimation.

The way to improve the robustness of online update is to reduce the introduction of the noisy and confusing data. Self-paced learning (SPL), which is recently proposed, is such a representative approach for robust learning. The origin of SPL is curriculum learning (CL) [1] proposed by Bengio et al. Furthermore, a set of training samples organized in ascending order of learning difficulty are defined in a curriculum from the CL. However, the curriculum is always fixed during the iterations and not affected by the subsequent learning. Then, inspired by the learning process of humans/animals, Kumar et al. propose the SPL to generate the dynamic curriculum according to what the model has already learned. SPL has the benefit of avoiding the bad local minima and achieving a more reasonable solution. Based on the above analysis, we propose a novel self-paced sample space model by distinguishing the reliable data from the noisy and confusing data to avert the model drifts.

The current trackers employ existing deep learning networks which have been offline pre-trained for a large amount of data to extract features. In the traditional CNNs structure, only the nearest previous layers output is used as the input of the current layer, resulting in the discarding of other existing features. [5] proposed a densely connected network by adding shortcut connections to enhance the information flow between layers and the feature reusing of the network. Inspired by the DenseNet, we apply the densely connected learning to reduce the dependency of the adjacent layers in the CNNs and improve the ability of feature representations.

The contributions of this paper are mainly summarized as three folds: (i) We propose a novel self-paced sample space model that integrate the SPL framework into the visual tracking. It avoids the drifts of online update by choosing the reliable data from noisy and confusing data. (ii) We apply the densely connected learning to enhance the information flow and feature reuse of the network. It effectively facilitates the representation power of the features. (iii) We conduct extensive experiments on the benchmark datasets. The results show that our tracker achieves the state-of-the-art performance.

The rest of this paper is organized as follows. We first introduce the existing visual tracking algorithm and self-paced learning framework in Sect. 2. Then, our SPDCT model and visual tracking algorithm are discussed in Sect. 3 and Sect. 4. Experiments are detailed in Sect. 5 and concluding remarks are given in Sect. 6.

2 Related Work

CNN-Based Tracking. The capability of feature representations is very important for visual tracking. Deep neural network, especially CNNs, is developing rapidly and has been successfully applied into visual tracking [11, 19, 20]. FCNT [19] employs a fully convolutional neural and proposes a feature map selection method to improve tracking accuracy. HCFT [11] adopts the hierarchical features to train correlation filters. STCT [20] casts online training CNN as learning ensembles to reduce over-fitting. Other CNN-based trackers consider that transfer pre-trained deep features may not be appropriate for online tracking. These methods mentioned above directly employ the traditional CNNs structure to capture the appearance change of target. Different from existing tracking methods that based on convolutional neural network, we propose a densely connected learning method to improve the robustness of visual representation through feature reuse.

Self-Paced Learning. Self-paced learning [10, 12] is to learn the model iteratively from easy to complex samples inspired by the learning process of humans/animals. Compared with other machine learning methods, SPL jointly learns the curriculum and model parameters by incorporating a self-paced function and a pace parameter into the objective function. When pace is small, only ’easy’ samples with small costs will be chosen into training data. As the value of pace grows, more samples with larger losses will be gradually appended to train a more ’mature’ model. [12] has proven that the learning process of traditional SPL regime can be guaranteed to converge to rational critical points of the corresponding implicit NCRP objective. SPL has been successfully applied to various applications, such as action and event detection [7], reranking [6], segmentation [8], and co-saliency detection [22]. [16] employs self-paced learning to solve long-term tracking problems. Compared with [16], we use self-paced learning method to select reliable frames. However, this paper adopts feature pyramid method to fuse multi-layer CNN features and densely connected learning to improve the robustness of feature representation.

3 Proposed Visual Tracking Method

The main idea of the self-paced densely connected convolutional neural network is integrating the SPL framework into visual tracking algorithm. Specifically, SPDCT tends to distinguish the reliable data from the noisy data, and then uses them to update tracker to ensure the robust of the model. Figure 1 shows our SPDCT model pipeline. The details are discussed as follows.

3.1 Self-paced Sample Space Model

We propose a self-paced sample space model (SPSS) to avoid introducing background noise through online update. Formally, we denote the training dataset as $D = \left\{ \left( x_{1}, y_{1} \right) , \left( x_{2}, y_{2} \right) ,...,\left( x_{n},y_{n} \right) \right\} $, where $x_{i}$ and $y_{i}$ denote the observed samples and correspond labels, respectively. Such an idea can be formulated as an optimization problem as follows,

$$\begin{aligned} \min \limits _{w,v}E\left( w,v \right) = \sum _{i=1}^{n}v_{i}L\left( y_{i} ,g\left( x_{i};w,b\right) \right) +f\left( \lambda ,v \right) \end{aligned}$$

(1)

where L(x) denotes the quadratic loss function under the estimated response value $g\left( x_{i},w \right) $ with the weight vector w and bias parameter b. $v = \left[ v_{1},v_{2},...,v_{n} \right] , v\in \left[ 0,1 \right] ^{n}$ denotes the important weights for all training samples, $v = 1$ indicates a reliable sample and $\lambda $ is the pace parameter for controlling the selecting pace. The capability of the self-paced sample space model is determined by the self-paced function that avoids the negative influence brought by large-noise-outliers. The formula of the self-paced function as the following form:

$$\begin{aligned} f\left( \lambda ,v \right) = -\left\| v_{1} \right\| = -\lambda \sum _{i=1}^{n}v_{i} \end{aligned}$$

(2)

Similar to SPL, the optimization problem of Eq. 1 can be solved by alternately optimizing the important weight v and the weight vector w of a sample of variables. Under fixed v, weight vector w can be optimized by existing off-the-shelf supervised learning methods, such as back propagation algorithm. Under fixed $\{w,b\}, v = \left[ v_{1},v_{2},...,v_{n}\right] $ can be easily calculated by

$$\begin{aligned} v_{i}^{*} = {\left\{ \begin{array}{ll} 1, L\left( y_{i} ,g\left( x_{i},w \right) \right) < \lambda \\ 0, otherwise \end{array}\right. } \end{aligned}$$

(3)

In traditional SPL methods, the parameter of pace adds a fixed value for each iteration to choose more hard samples, and it is difficult to effectively determine the fixed value. In this paper, we propose an adaptive strategy based on the number of samples. In the $t^{th}$ iteration, $N_{t}$ denotes the total number of training samples and $N_{p}$ means the proportion of samples selected. We first get the $L_{sort}$ by sorting the samples in ascending order according to their weights, and the $\left( N_{t}*N_{p} \right) ^{th}$ loss value of $L_{sort}$ is used as the parameter value of pace in the SPL. As shown in Eq. 4.

$$\begin{aligned} \lambda = L_{sort}\left[ \left( N_{t}*N_{p} \right) \right] \end{aligned}$$

(4)

Fig. 2 shows the comparison between the SPDCT algorithm(bottom row) and the CREST(top row) method. In CREST, the training data is composed of continuous video frames, which is easy to overfit the current video frames. For example, when occlusion occurs, CREST learns more background information, which causes tracking drift. In contrast, our model chooses reliable training samples through SPSS model to avoid introducting background noise.

3.2 Densely Connected Learning

The quality of the features determines the performance of the tracker based on convolutional neural networks. Most of CNN-based trackers employ the traditional CNNs structure directly to capture the appearance change of target. In the traditional CNN network structure, only the output of the previous layer is used as the input of the current layer, which leads to discarding existing features and hindering in convolutional neural networks. To enhance the reuse of the features and reduce the dependence of adjacent layers, we apply the densely connected convolutional network instead of the traditional CNNs, which connects each layer to every other layer in a feed-forward fashion. Figure 3 shows structure of the densely connected learning. The $l^{th}$ layer receives the feature maps generated by all of previous layers as input. This form of densely connected learning can be formulated as follows:

$$\begin{aligned} x_{l} = H_{l}\left( \left[ x_{0},x_{1},...,x_{l-1} \right] \right) \end{aligned}$$

(5)

where $H_{l}\left( x \right) $ denotes the non-linear transformation function composed of convolution (Conv) and rectified linear units (ReLU). Similar to [5], we concatenate the multiple inputs of $H_{l}\left( x \right) $. $x_{l}$ is the output of the $l^{th}$ layer. We adopt four layers in the densely connected learning with a small growth rate.

Densely connected layer enhances feature reuse and maximize the information flow through the neural network. According to the results in Sect. 5, the learned features are more robust for appearance change.

3.3 Multi-layer Features Fusion

According to FCNT [19], convolutional layers at different levels focus on different perspectives of target. A top layer encodes more semantic features with low-resolution map, while a lower layer carries more spatial information with high-resolution map. In order to maintain the spatial and semantic information of features, we adopt feature pyramid method as described in Feature pyramid networks (FPN) [9] to achieve multi-layer fusion, as shown in Fig. 4.

4 Tracking with SPDCT

We illustrate the detailed procedure of SPDCT from model initialization, detection, scale estimation, and online update, as listed in algorithm 1.

Model Initialization. Similar to CREST [15], given the first frame with the target location, we extract a training patch centered on the target location and send the patch to an existing deep neural network to extract the features. Soft labels are used as the input to the densely connected learning to train weight and bias parameters of the network. All the parameters in the densely connected layers are randomly initialized following zero mean Gaussian distribution.

Detection. After a new frame’s arrival, we crop a search patch centered on the tracking results of the previous frame. The patch and the training data have the same size. We obtain the response map through the densely connected layers, which locates the target position based on the maximum response value. The online tracking strategy is extremely simple and straightforward.

Scale Estimation. When we obtain the center location of the target, we crop the frame at different scales to get some patches. We send these patches to SPDCT to get the response values of target. We evaluate the scale of target by searching for the maximum response value.

Online Update. We adopt self-paced sample space model to obtain the reliable training data for model update. We first collect tracking results as training samples. For each frame, the corresponding soft label can be generated according to the predicted location. When obtaining the response map of target, we calculate the sample weight by Eq. 3 and choose samples with $v = 1$ for online update. In order to reduce the over-fitting of recent samples and to satisfy the memory constraint, we select a maximum of N samples at a time and online update the model every fixed frames.

5 Experiments

In this section, we first explain the implementation details and then analyze the effects of self-paced sample space model and densely connected learning. We validate the performance of our SPDCT tracker against state-of-the-art trackers on three benchmark dataset: OTB-50,OTB-100 [21] and UAV123 [13].

5.1 Experiments Setups

Implementation Details. Consistenting with the existing trackers, we set up the VGG model as feature extractor. We obtain the features from the output of conv3-3 and con4-3 layers of the VGG model. In the first frame, we obtain the training sample with five times the size of the target bounding box. The soft label and the learning rate are set to a two-dimensional Gaussian function with peak value of 1 and 5e−7, respectively. In the online update, we calculate the $\lambda $ by Eq. 4 and choose the reliable data with the adaptive percentage $N_{p}$ of 0.5. N is set to 11. The SPDCT model is fine-tuned for 2 iterations with the learning rate of 1e−8 for every 5 frames. The SPDCT is implemented in MATLAB based on the wrapper of MatConvNet [18].

Benchmark Datasets. We conduct our experiments on three benchmark datasets: OTB-50,OTB-100 [21] and UAV123 [13]. The OTB-50 and OTB-100 datasets have 50, 100 real-world targets for tracking, respectively. There are 11 attributes, such as occlusion, scale variation, motion blur, and background clutters. The UAV123 dataset consists of 123 aerial videos with more than 110 K frames.

Evaluation Methodology. We use the one-pass evaluation (OPE) with precision and success plots to evaluate the current state-of-the-art trackers. Precision plot shows the percentage of frames where the distance between the estimated location and the ground truth within 20 pixels. Success plot demonstrates the percentage of frames where the estimated box and the ground truth box overlap. All the trackers are ranked according to the area under curve (AUC) of each success plot.

5.2 Ablation Studies

The SPDCT algorithm consists of self-paced sample space model and densely connected learning. Based on the experimental results on the OTB-50 dataset, we apply the ablation studies method to analyze the effect of each part. We set up four contrast experiments including a standard SPDCT tracker, SPDCT tracker without the self-paced sample space model (SPDCT-spss free), SPDCT tracker without the densely connected model (SPDCT-densely learning free), and SPDCT with neither self-paced sample space model nor densely connected learning (SPDCT-neither).

Figure 5 shows the precision and success plots of the above ablative experiments. The experimental results show that both models of self-paced sample space model and densely connected learning are helpful to improve the performance. Self-paced sample space model enhances the ability of the tracker to discern the target because of the selection of a reliable sample for updating the model to avoid introducing noise. Densely connected learning model enriches the input of convolutional layers by reusing the convolutional features, alleviates over-fitting and enhances the representation power of features. In precision plots, SPDCT-spss free performs worse than SPDCT-neither. Because densely connected learning is more capable of learning. When there is noise in a training sample, the model learns features unrelated to the target and loses its representation power. The standard SPDCT has the best results.

Fig. 7.

Precision and success plots on the OTB-100 dateset.

Full size image

Fig. 8.

Precision and success plots on the UAV123 dateset.

Full size image

Fig. 9.

Qualitative evaluation of our SPDCT tracker, DeepSRDCF, HCFT, CREST on three challenging sequences, from top to down, ClifBar, Human3, Ironman.

Full size image

5.3 Comparisons to State-of-the-art Trackers

In this section, we compare the SPDCT model with the recent state-of-the-art trackers, including HDT [14], CREST [15], SRDCFdecon [4], DeepSRDCF [2], HCFT [11], SINT [17], FCNT [19], MEEM [23], SRDCF [3], and other 29 trackers from OTB-2015 benchmark [21] and UAV123 [13]. We initialize the model randomly, and then use the first frame of video as the training sample. The ten best results are shown in Figs. 6, 7 and 8. On the OTB-50, OTB-100 and UVA123, the experimental results show that our SPDCT tracking algorithm performs a best among these trackers. On the OTB-50, the performance of precision plot is 3.5% and 3.6% higher than the performance of HDT and HCFT separately. The performance of success plot is 0.7% and 0.9% higher than the performance of CREST and BACF separately. On the OTB-100, the performance of precision plot is 1.7% and 2.0% higher than DeepSRDCF and HDT separately, and reaches the fourth best on success plots. On the UAV123, the performance of precision plot is 2.1% higher than the performance of SRDCF, and reaches the second best on success plots. Our SPDCT model does not use any auxiliary training data. We consider the reliability of the samples by self-paced sample space model and feature reuse by densely connected learning, improving the robustness of the model. The results reached the state-of-the-art performance and shows that our SPDCT model has good generalization ability. Figure 9 visualizes quantitative evaluation results. We compare three top performing trackers: DeepSRDCF, HCFT, CREST with our SPDCT tracker on three challenging sequences. The results show that our SPDCT model achieves the state-of-the-art trackers.

6 Conclusion

In this paper, we have proposed a novel self-paced sample space model that integrate the SPL framework into the visual tracking for distinguishing the reliable date from noisy and confusing data to avoid the model drifts problem. We also apply the densely connected learning to improve the information flow and feature reuse of the network, while enhancing the representation power of the features effectively. Experiments on three benchmark datasets demonstrate that our SPDCT model achieves state-off-the-art performance. In the future, we will consider how to effectively construct the diversity samples of visual tracking in self-paced learning framework.

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1430–1438 (2016)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2017)
Google Scholar
Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 547–556. ACM (2014)
Google Scholar
Jiang, L., Meng, D., Yu, S.I., Lan, Z., Shan, S., Hauptmann, A.: Self-paced learning with diversity. In: Advances in Neural Information Processing Systems, pp. 2078–2086 (2014)
Google Scholar
Kumar, M.P., Turki, H., Preston, D., Koller, D.: Learning specific-class segmentation from diverse data. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1800–1807. IEEE (2011)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)
Google Scholar
Liu, S., Ma, Z., Meng, D.: Understanding self-paced learning under concave conjugacy theory. Commun. Inf. Syst. 18, 1–35 (2018)
MathSciNet MATH Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Google Scholar
Ma, Z., Liu, S., Meng, D.: On convergence property of implicit self-paced objective. Inf. Sci. 462, 132–140 (2018)
Article Google Scholar
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Qi, Y., et al.: Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2018)
Google Scholar
Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.H.: Crest: convolutional residual learning for visual tracking. In: IEEE International Conference on Computer Vision, pp. 2555–2564 (2017)
Google Scholar
Supancic III, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2379–2386. IEEE (2013)
Google Scholar
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420–1429. IEEE (2016)
Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet: convolutional neural networks for MATLAB. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 689–692. ACM (2015)
Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H.: STCT: Sequentially training convolutional networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1373–1381 (2016)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Zhang, D., Meng, D., Li, C., Jiang, L., Zhao, Q., Han, J.: A self-paced multiple-instance learning framework for co-saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 594–602 (2015)
Google Scholar
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_13
Chapter Google Scholar

Download references

Acknowledgements

The work was jointly supported by the National Key R&D Program of China under Grant No. 2018YFC0807500, the National Natural Science Foundations of China under grant No. 61772396, 61472302, 61772392, the Fundamental Research Funds for the Central Universities under grant No. JB170306, JB170304, No. JBF180301 and Xi’an Key Laboratory of Big Data and Intelligent Vision under grant No. 201805053ZD4CG37.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, 710071, Shaanxi, China
Daohui Ge, Jianfeng Song, Yutao Qi, Chongxiao Wang & Qiguang Miao
Xian Key Laboratory of Big Data and Intelligent Vision, Xi’an, 710071, Shaanxi, China
Daohui Ge, Jianfeng Song, Yutao Qi, Chongxiao Wang & Qiguang Miao

Authors

Daohui Ge
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Qi
View author publications
You can also search for this author in PubMed Google Scholar
Chongxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiguang Miao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Song .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Jian-Huang Lai
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Jiaotong University, Xi’an, China
Nanning Zheng
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ge, D., Song, J., Qi, Y., Wang, C., Miao, Q. (2018). Self-Paced Densely Connected Convolutional Neural Network for Visual Tracking. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11259. Springer, Cham. https://doi.org/10.1007/978-3-030-03341-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-03341-5_9
Published: 02 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03340-8
Online ISBN: 978-3-030-03341-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Self-Paced Densely Connected Convolutional Neural Network for Visual Tracking

Abstract

Similar content being viewed by others

Online Tracking with Convolutional Neural Networks

Robust Online Visual Tracking with a Single Convolutional Neural Network

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Keywords

1 Introduction

2 Related Work