Abstract
The purpose of single target tracking is to accurately and continuously locate a specific object when it is moving. However, when the objects encounter with fast movement, severe occlusion, too small size, and the same local features, the tracking algorithm which based on correlation filter or convolutional neural network will appear the positioning error phenomenon. Aiming at the above problems, this paper designs a single target tracking algorithm: relative temporal spatial network (RTSnet). RTSnet is a multi-thread network that composed of Relative temporal Information Network (RTInet) and Relative Spatial Information Network (RSInet). RTInet is designed on the basis of LSTM, and it has the predictable characteristics of temporal. It mainly obtains the relative temporal information between the frames before and after the target. RSInet, an improved twin network based on the Triplet Network, has the effect of similarity determination which can to obtain the spatial information between the frames before and after the target. In the experiments, the RTSnet is trained by using LASOT data set and verified by using the LASOT test set and the OTB100 data set. In the test set of LASOT, the accuracy of RTSnet reaches 85.5%, Trans-T reaches 62.3% and STMTrack reaches 57.4%. Meanwhile, its tracking speed reaches 117.3fps due to the RTSnet adopts dual-thread operation. On the OTB100 data-set, the accuracy of RTSnet is 81.1%.
Similar content being viewed by others
Data availability
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s the Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s the Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Bertinetto L, Valmadre J, Henriques J F, et al (2016a) Fully-convolutional siamese networks for object tracking. European conference on computer vision. pp 850-865. https://doi.org/10.1007/978-3-319-48881-3_56
Bertinetto L, Valmadre J, Golodetz S, et al (2016b) Staple: complementary learners for real-temporal tracking[C]. Computer vision and pattern recognition, pp 1401–1409
Bolme DS, Beveridge JR, Draper BA et al (2010) Visual object tracking using adaptive correlation filters[C]. Computer vision and pattern recognition, pp 2544–2550
Chen X, Yan B, Zhu J, Wang D, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Cheng S, Zhong B, Li G, Liu X, Tang Z, Li X et al. (2021). Learning to filter: siamese relation network for robust tracking
Danelljan M, Hager G, Khan F S, et al (2014) Accurate Scale Estimation for Robust Visual Tracking. British Machine Vision Conference.
Danelljan M, Hager G, Khan F S, et al (2015) Learning Spatially Regularized Correlation Filters for Visual Tracking[C]. International conference on computer vision, pp 4310–4318
Danelljan M, Bhat G, Khan F S, et al (2017) ECO: Efficient Convolution Operators for Tracking[C]. Computer vision and pattern recognition, pp 6931–6939
Fan H, Lin L, Yang F et al (2018) LaSOT: a high-quality benchmark for large-scale single object tracking. arXiv: Comp Vis Patt Recogn. https://doi.org/10.1007/s11263-020-01387-y
Fu Z, Liu Q, Fu Z, Y Wang (2021) Stmtrack: template-free visual tracking with space-time memory networks
Galoogahi H K, Fagg A, Lucey S, et al (2017) Learning Background-Aware Correlation Filters for Visual Tracking. International conference on computer vision, pp 1144–1152
Greve R, Jacobsen E J, Risi S, et al (2016) Evolving Neural Turing Machines for Reward-based Learning[C]. Genetic and evolutionary computation conference, pp 117–124.
Gulcehre C, Chandar S, Cho K, et al (2016) Dynamic neural turing machine with soft and hard addressing schemes [J]
Guo Q, Feng W, Zhou C, et al (2017) Learning dynamic siamese network for visual object tracking[C]. International conference on computer vision, pp 1781–1789
Hare S, Golodetz S, Saffari A et al (2015) Struck: structured output tracking with kernels. IEEE Trans Patt Anal Mach Intell 38(10):2096–2109
Held D, Thrun S, Savarese S, et al (2016) Learning to track at 100 FPS with deep regression networks. European conference on computer vision, pp 749–765. http://dpi.org/https://doi.org/10.1007/978-3-319-46448-0_45
Henriques JF, Caseiro R, Martins P et al (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Patt Anal Mach Intell 37(3):583–596
Henriques J F, Caseiro R, Martins P, et al (2012) Exploiting the circulant structure of tracking-by-detection with kernels[C]. European Conference on computer vision, pp 702–715
Hoffer E, Ailon N (2015) Deep metric learning using triplet network [J]. http://dpi.org/https://doi.org/10.1007/978-3-319-24261-3_7
Iandola F N, Han S, Moskewicz M W, et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [J]. https://doi.org/10.1007/978-1-4842-6168-2_7.
Li B, Yan J, Wu W, et al (2018) High performance visual tracking with siamese region proposal network. Computer vision and pattern recognition, pp 8971–8980
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. Eur Conf Computer Vision. https://doi.org/10.1007/978-3-319-16181-5_18
Lukezic A, Vojir T, Zajc L C, et al (2017) Discriminative correlation filter with channel and spatial reliability[C]. Computer vision and pattern recognition, pp 4847–4856. https://doi.org/10.1007/s11263-017-1061-3
Ma N, Zhang X, Zheng H T, et al (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design [J]. https://doi.org/10.1007/978-3-030-01264-9_8
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking[C]. Computer vision and pattern recognition, pp 4293–4302
Possegger H, Mauthner T, Bischof H, et al (2015) In defense of color-based model-free tracking[C]. Computer vision and pattern recognition, pp 2113–2120.
Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-temporal object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149
Sandler M, Howard A, Zhu M, et al (2018) Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [J].
Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition [J].
Song Y, Ma C, Wu X, et al (2018) VITAL: visual tracking via adversarial learning[C]. Computer vision and pattern recognition, pp 8990–8999
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions[C]. Computer vision and pattern recognition, pp 1–9.
Valmadre J, Bertinetto L, Henriques J F, et al (2017) End-to-End Representation Learning for Correlation Filter Based Tracking[C]. Computer vision and pattern recognition, pp 5000–5008.
Wang M, Liu Y, Huang Z, et al (2017) Large Margin Object Tracking with Circulant Feature Maps[C]. Computer vision and pattern recognition, pp 4800–4808.
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Patt Anal Mach Intell 37(9):1834–1848
Yang DD, Cai YZ, Mao N et al (2016) Long-term object tracking based on kernelized correlation filters. Optics Prec Eng 24(8):2037–2049
Zhang Y, Wang L, Qi J, et al (2018) Structured siamese network for real-temporal visual tracking[C]. European conference on computer vision, pp 355–370
Zhou J, Xu W (2015) End-to-end learning of semantic role labeling using recurrent neural networks[C]. International joint conference on natural language processing, pp 1127–1137
Acknowledgements
This work is supported by The Natural Science Foundation of Guangdong Province under the Grant No.2020A1515010784, The National Natural Science Foundation of China under the Grant No.61976063 and Natural Science Program of Guangdong University of Science and Technology under the Grant No. GKY-2021KYQNK-2.
Funding
The work was supported by the Natural Science Foundation of Guangdong Province (No. 2020A1515010784).
Author information
Authors and Affiliations
Contributions
XJ conceived the algorithms, conducted experimental demonstrations, and wrote the paper; ZL wrote the paper; KL wrote the paper; SZ wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jia, X., Li, Z., Li, K. et al. A method of real-temporal object tracking combined the temporal information and spatial information. Soft Comput 26, 8689–8698 (2022). https://doi.org/10.1007/s00500-022-07154-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07154-0