Skip to main content

Advertisement

Log in

Robust appearance modeling for object detection and tracking: a survey of deep learning approaches

  • Review
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

The task of object detection and tracking is one of the most complex and challenging problems in artificial intelligence (AI) systems that model perception. Object tracking has practical importance in AI applications like human–machine interaction, robotics, autonomous driving and extended reality. The fundamental task of object tracking is to detect objects in one video frame and maintain their identities or infer their trajectories across all subsequent frames. Real-world object tracking systems typically operate in highly complex and dynamic environments, with constantly changing object appearance and scene conditions, making it challenging to adequately characterize target objects with a single model. Traditional AI solutions rely on modeling handcrafted features based on rigorous mathematical formulations. This process is a highly non-trivial task and severely restricts end solutions to narrowly focused application settings. Today, deep learning techniques are the most preferred approaches due to their high generalization ability and ease of implementation. This paper surveys the most important deep learning-based appearance modeling techniques. We propose a unique taxonomy of approaches based on the architectural elements and auxiliary strategies that are employed in deep learning models for robust appearance modeling. The surveyed methodologies include data-centric techniques, compositional part modeling, similarity learning methods, memory and attention mechanisms, as well as approaches that integrate differentiable models within deep learning architectures to explicitly model spatial transformations. The fundamental principles, implementation details and application contexts, as well as the main strengths and potential limitations of the approaches are highlighted. We also present common datasets, evaluation metrics and performance results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Zhu, H., Wei, H., Li, B., Yuan, X., Kehtarnavaz, N.: A review of video object detection: datasets, metrics and methods. Appl. Sci. 10(21), 7834 (2020)

    Article  Google Scholar 

  2. Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)

  3. Elharrouss, O., Almaadeed, N., Al-Maadeed, S., Bouridane, A., Beghdadi, A.: A combined multiple action recognition and summarization for surveillance video sequences. Appl. Intell. 51(2), 690–712 (2021)

    Article  Google Scholar 

  4. Najeeb, H.D., Ghani, R.F.: A survey on object detection and tracking in soccer videos. MJPS 8(1), 1–13 (2021)

    Article  Google Scholar 

  5. Siddique, A., Medeiros, H.: Tracking passengers and baggage items using multi-camera systems at security checkpoints. arXiv preprint arXiv:2007.07924 (2020)

  6. Krishna, V., Ding, Y., Xu, A., Höllerer, T.: Multimodal biometric authentication for VR/AR using EEG and eye tracking. In: Adjunct of the 2019 International Conference on Multimodal Interaction, pp. 1–5 (2019)

  7. D’Ippolito, F., Massaro, M., Sferlazza, A.: An adaptive multi-rate system for visual tracking in augmented reality applications. In: IEEE 25th International Symposium on Industrial Electronics (ISIE), vol. 2016, pp. 355–361. IEEE (2016)

  8. Guo, Z., Huang, Y., Hu, X., Wei, H., Zhao, B.: A survey on deep learning based approaches for scene understanding in autonomous driving. Electronics 10(4), 471 (2021)

    Article  Google Scholar 

  9. Moujahid, D., Elharrouss, O., Tairi, H.: Visual object tracking via the local soft cosine similarity. Pattern Recognit. Lett. 110, 79–85 (2018)

    Article  Google Scholar 

  10. Wang, N., Shi, J., Yeung, D.Y., Jia, J.: Understanding and diagnosing visual tracking systems. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3101–3109 (2015)

  11. Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., Hengel, A.V.D.: A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4(4), 1–48 (2013)

    Article  Google Scholar 

  12. Dutta, A., Mondal, A., Dey, N., Sen, S., Moraru, L., Hassanien, A.E.: Vision tracking: a survey of the state-of-the-art. SN Comput. Sci. 1(1), 1–19 (2020)

    Article  Google Scholar 

  13. Walia, G.S., Kapoor, R.: Recent advances on multicue object tracking: a survey. Artif. Intell. Rev. 46(1), 1–39 (2016)

    Article  Google Scholar 

  14. Manafifard, M., Ebadi, H., Moghaddam, H.A.: A survey on player tracking in soccer videos. Comput. Vis. Image Underst. 159, 19–46 (2017)

    Article  Google Scholar 

  15. Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X, et al.: Multiple object tracking: a literature review. arXiv preprint arXiv:1409.7618 (2014)

  16. SM, J.R., Augasta, G.: Review of recent advances in visual tracking techniques. Multimed. Tools Appl. 16, 24185–24203 (2021)

    Google Scholar 

  17. Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020)

    Article  Google Scholar 

  18. Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Transp. Syst. 23, 3943–3968 (2021)

    Article  Google Scholar 

  19. Xu, Y., Zhou, X., Chen, S., Li, F.: Deep learning for multiple object tracking: a survey. IET Comput. Vis. 13(4), 355–368 (2019)

    Article  Google Scholar 

  20. Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323–338 (2018)

    Article  Google Scholar 

  21. Sun, Z., Chen, J., Liang, C., Ruan, W., Mukherjee, M.: A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Trans. Circuits Syst. Video Technol. 31, 1819–1833 (2020)

    Article  Google Scholar 

  22. Fiaz, M., Mahmood, A., Jung, S.K.: Tracking noisy targets: a review of recent object tracking approaches. arXiv preprint arXiv:1802.03098 (2018)

  23. Sugirtha, T., Sridevi, M.: A survey on object detection and tracking in a video sequence. In: Proceedings of International Conference on Computational Intelligence, pp. 15–29. Springer (2022)

  24. Brunetti, A., Buongiorno, D., Trotta, G.F., Bevilacqua, V.: Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300, 17–33 (2018)

    Article  Google Scholar 

  25. Ravoor, P.C., Sudarshan, T.: Deep learning methods for multi-species animal re-identification and tracking—a survey. Comput. Sci. Rev. 38, 100289 (2020)

    Article  MathSciNet  Google Scholar 

  26. Kamble, P.R., Keskar, A.G., Bhurchandi, K.M.: Ball tracking in sports: a survey. Artif. Intell. Rev. 52(3), 1655–1705 (2019)

    Article  Google Scholar 

  27. Fahmidha, R., Jose, S.K.: Vehicle and pedestrian video-tracking: a review. In: 2020 International Conference on Communication and Signal Processing (ICCSP), pp. 227–232. IEEE (2020)

  28. Shukla, A., Saini, M.: Moving object tracking of vehicle detection: a concise review. Int. J. Signal Process. Image Process. Pattern Recognit. 8(3), 169–176 (2015)

    Google Scholar 

  29. Karuppuchamy, S., Selvakumar, R.: A Survey and study on “vehicle tracking algorithms in video surveillance system”. In: 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–4. IEEE (2017)

  30. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., et al.: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

  31. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.K., et al.: The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  32. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.K., et al.: The eighth visual object tracking VOT2020 challenge results. In: European Conference on Computer Vision, pp. 547–601. Springer (2020)

  33. Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., et al.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput Vis. 129(4), 845–881 (2021)

    Article  Google Scholar 

  34. Lan, L., Wang, X., Zhang, S., Tao, D., Gao, W., Huang, T.S.: Interacting tracklets for multi-object tracking. IEEE Trans. Image Process. 27(9), 4585–4597 (2018)

    Article  MathSciNet  Google Scholar 

  35. Milan, A., Schindler, K., Roth, S.: Multi-target tracking by discrete-continuous energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2054–2068 (2015)

    Article  Google Scholar 

  36. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

  37. Li, H., Li, Y., Porikli, F., et al.: DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: BMVC, vol. 1, p. 3 (2014)

  38. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., et al.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  39. Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W., Yang, M.H.: Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2555–2564 (2017)

  40. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)

  41. Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning, pp. 597–606. PMLR (2015)

  42. Tao, Q.Q., Zhan, S., Li, X.H., Kurihara, T.: Robust face detection using local CNN and SVM based on kernel combination. Neurocomputing 211, 98–105 (2016)

    Article  Google Scholar 

  43. Niu, X.X., Suen, C.Y.: A novel hybrid CNN-SVM classifier for recognizing handwritten digits. Pattern Recognit. 45(4), 1318–1325 (2012)

    Article  Google Scholar 

  44. Li, H., Li, Y., Porikli, F.: Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans. Image Process. 25(4), 1834–1848 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  45. Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems (2013)

  46. Zhou, K., Yang, Y., Hospedales, T., Xiang, T.: Deep domain-adversarial image generation for domain generalisation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 13025–13032 (2020)

  47. Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M., Visual object tracking using adaptive correlation filters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. vol. 2010, pp. 2544–2550. IEEE (2010)

  48. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)

  49. Zhang, F., Ma, S., Qiu, Z., Qi, T.: Learning target-aware background-suppressed correlation filters with dual regression for real-time UAV tracking. Signal Process. 191, 108352 (2022)

    Article  Google Scholar 

  50. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

  51. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  52. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265. Springer (2014)

  53. Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on computer vision, pp. 472–488. Springer (2016)

  54. Tang, S., Andriluka, M., Andres, B., Schiele, B.: Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3539–3548 (2017)

  55. Kieritz, H., Hubner, W., Arens, M.: Joint detection and online multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1459–1467 (2018)

  56. Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605 (2019)

  57. Sultana, F., Sufian, A., Dutta, P.: A review of object detection models based on convolutional neural network. In: Image Processing Based Applications, Intelligent Computing, pp. 1–16 (2020)

  58. Henschel, R., Leal-Taixé, L., Cremers, D., Rosenhahn, B.: Fusion of head and full-body detectors for multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1428–1437 (2018)

  59. Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. arXiv preprint arXiv:2104.00194 (2021)

  60. Yu, E., Li, Z., Han, S., Wang, H.: RelationTrack: relation-aware multiple object tracking with decoupled representation. arXiv preprint arXiv:2105.04322 (2021)

  61. Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell T, et al.: Quasi-dense similarity learning for multiple object tracking. arXiv preprint arXiv:2006.06664 (2020)

  62. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020)

  63. Ullah, M., Cheikh, F.A.: Deep feature based end-to-end transportation network for multi-target tracking. In: 25th IEEE International Conference on Image Processing (ICIP), vol. 2018, pp. 3738-3742. IEEE (2018)

  64. Ren, L., Lu, J., Wang, Z., Tian, Q., Zhou, J.: Collaborative deep reinforcement learning for multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 586–602 (2018)

  65. Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: siamese CNN for robust target association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 33–40 (2016)

  66. Zhang, S., Gong, Y., Huang, J.B., Lim, J., Wang, J., Ahuja, N., et al.: Tracking persons-of-interest via adaptive discriminative features. In: European Conference on Computer Vision, pp. 415–433. Springer (2016)

  67. Chen, L., Ai, H., Shang, C., Zhuang, Z., Bai, B.: Online multi-object tracking with convolutional neural networks. In: 2017 IEEE international conference on image processing (ICIP), pp. 645–649. IEEE (2017)

  68. Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixé, L., Alameda-Pineda, X.: How to train your deep multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6787–6796 (2020)

  69. Yoon, Y.C., Kim, D.Y., Song, Y.M., Yoon, K., Jeon, M.: Online multiple pedestrians tracking using deep temporal appearance matching association. Inf. Sci. 561, 326–351 (2021)

    Article  MathSciNet  Google Scholar 

  70. Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951 (2019)

  71. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490. Springer (2020)

  72. Jia, Y.J., Lu, Y., Shen, J., Chen, Q.A., Chen, H., Zhong, Z., et al.: Fooling detection alone is not enough: adversarial attack against multiple object tracking. In: International Conference on Learning Representations (ICLR’20) (2020)

  73. Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3038–3046 (2017)

  74. Lu, Z., Rathod, V., Votel, R., Huang, J.: Retinatrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)

  75. Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. arXiv preprint arXiv:2103.08808 (2021)

  76. Chaabane, M., Zhang, P., Beveridge, J.R., O’Hara, S.: DEFT: detection embeddings for tracking. arXiv preprint arXiv:2102.02267 (2021)

  77. Sampath, V., Maurtua, I., Martín, J.J.A., Gutierrez, A.: A survey on generative adversarial networks for imbalance problems in computer vision tasks. J. Big Data. 8(1), 1–59 (2021)

    Article  Google Scholar 

  78. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5(4), 221–232 (2016)

    Article  Google Scholar 

  79. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)

  80. Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., et al.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)

  81. Bhat, G., Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 483–498 (2018)

  82. Wang, Y., Wei, X., Tang, X., Shen, H., Ding, L.: CNN tracking based on data augmentation. Knowl.-Based Syst. 194, 105594 (2020)

    Article  Google Scholar 

  83. Neuhausen, M., Herbers, P., König, M.: Synthetic data for evaluating the visual tracking of construction workers. In: Construction Research Congress 2020: Computer Applications, pp. 354–361. American Society of Civil Engineers Reston, VA (2020)

  84. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2016)

  85. Shermeyer, J., Hossler, T., Van Etten, A., Hogan, D., Lewis, R., Kim, D.: Rareplanes: synthetic data takes flight. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 207–217 (2021)

  86. Han, Y., Zhang, P., Huang, W., Zha, Y., Cooper, G., Zhang, Y.: Robust visual tracking using unlabeled adversarial instance generation and regularized label smoothing. Pattern Recognit. 1–15 (2019)

  87. Cheng, X., Song, C., Gu, Y., Chen, B.: Learning attention for object tracking with adversarial learning network. EURASIP J. Image Video Process. 2020(1), 1–21 (2020)

    Article  Google Scholar 

  88. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)

  89. Han, Y., Zhang, P., Huang, W., Zha, Y., Cooper, G.D., Zhang, Y.: Robust visual tracking based on adversarial unlabeled instance generation with label smoothing loss regularization. Pattern Recognit. 97, 107027 (2020)

    Article  Google Scholar 

  90. Yin, Y., Xu, D., Wang, X., Zhang, L.: Adversarial feature sampling learning for efficient visual tracking. IEEE Trans. Autom. Sci. Eng. 17(2), 847–857 (2019)

    Article  Google Scholar 

  91. Wang, F., Wang, X., Tang, J., Luo, B., Li, C.: VTAAN: visual tracking with attentive adversarial network. Cognit. Comput. 13, 646–656 (2020)

    Article  Google Scholar 

  92. Javanmardi, M., Qi, X.: Appearance variation adaptation tracker using adversarial network. Neural Netw. 129, 334–343 (2020)

    Article  Google Scholar 

  93. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  94. Kim, H.I., Park, R.H.: Siamese adversarial network for object tracking. Electron. Lett. 55(2), 88–90 (2018)

    Article  Google Scholar 

  95. Wang, X., Li, C., Luo, B., Tang, J.: Sint++: Robust visual tracking via adversarial positive instance generation. In: Proceedings of the IEEE Conference on Computer Vision and pattern recognition, pp. 4864–4873 (2018)

  96. Guo, J., Xu, T., Jiang, S., Shen, Z.: Generating reliable online adaptive templates for visual tracking. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 226–230. IEEE (2018)

  97. Wu, Q., Chen, Z., Cheng, L., Yan, Y., Li, B., Wang, H.: Hallucinated adversarial learning for robust visual tracking. arXiv preprint arXiv:1906.07008 (2019)

  98. Kim, Y., Shin, J., Park, H., Paik, J.: Real-time visual tracking with variational structure attention network. Sensors 19(22), 4904 (2019)

    Article  Google Scholar 

  99. Lin, C.C., Hung, Y., Feris, R., He, L.: Video instance segmentation tracking with a modified vae architecture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13147–13157 (2020)

  100. Cheng, X., Zhang, Y., Zhou, L., Zheng, Y.: Visual tracking via auto-encoder pair correlation filter. IEEE Trans. Ind. Electron. 67(4), 3288–3297 (2019)

    Article  Google Scholar 

  101. Wang, L., Pham, N.T., Ng, T.T., Wang, G., Chan, K.L., Leman, K.: Learning deep features for multiple object tracking by using a multi-task learning strategy. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 838–842. IEEE (2014)

  102. Liu, P., Li, X., Liu, H., Fu, Z.: Online learned Siamese network with auto-encoding constraints for robust multi-object tracking. Electronics 8(6), 595 (2019)

    Article  Google Scholar 

  103. Xu, L., Niu, R.: Semi-supervised visual tracking based on variational siamese network. In: International Conference on Dynamic Data Driven Application Systems, pp. 328–336. Springer (2020)

  104. Tao, R., Gavves, E., Smeulders, AW.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  105. Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3018–3027 (2017)

  106. Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 79–88 (2018)

  107. Li, K., Zhang, Y., Li, K., Fu, Y.: Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13470–13479 (2020)

  108. Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Feris, R., et al.: Delta-encoder: an effective sample synthesis method for few-shot object recognition. arXiv preprint arXiv:1806.04734 (2018)

  109. Amirkhani, A., Barshooi, A.H., Ebrahimi, A.: Enhancing the robustness of visual object tracking via style transfer. Comput. Mater. Contin. 70(1), 981–997 (2022)

    Google Scholar 

  110. López-Sastre, R.J., Tuytelaars, T., Savarese, S.: Deformable part models revisited: A performance evaluation for object category pose estimation. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1052–1059. IEEE (2011)

  111. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)

  112. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  113. Papandreou, G., Zhu, T., Chen, L.C., Gidaris, S., Tompson, J., Murphy, K.: Personlab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–286 (2018)

  114. Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)

  115. Wang, B., Wang, L., Shuai, B., Zuo, Z., Liu, T., Luk Chan, K., et al.: Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2016)

  116. Uricár, M., Franc, V., Hlavác, V.: Facial landmark tracking by tree-based deformable part model based detector. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–17 (2015)

  117. Crivellaro, A., Rad, M., Verdie, Y., Moo Yi, K., Fua, P., Lepetit, V.: A novel representation of parts for accurate 3D object detection and tracking in monocular images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4391–4399 (2015)

  118. Li, J., Wong, H.C., Lo, S.L., Xin, Y.: Multiple object detection by a deformable part-based model and an R-CNN. IEEE Signal Process. Lett. 25(2), 288–292 (2018)

    Article  Google Scholar 

  119. De Ath, G., Everson, R.M.: Part-based tracking by sampling. arXiv preprint arXiv:1805.08511 (2018)

  120. Liu, W., Sun, X., Li, D.: Robust object tracking via online discriminative appearance modeling. EURASIP J. Adv. Signal Process. 2019(1), 1–9 (2019)

    Article  Google Scholar 

  121. Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 274–282 (2018)

  122. Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1904–1912 (2015)

  123. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1978 (2014)

  124. Gao, J., Zhang, T., Yang, X., Xu, C.: P2t: part-to-target tracking via deep regression learning. IEEE Trans. Image Process. 27(6), 3074–3086 (2018)

    Article  MathSciNet  Google Scholar 

  125. Lim, J.J., Dollar, P., Zitnick III, C.L.: Learned mid-level representation for contour and object detection. Google Patents; 2014. US Patent App. 13/794,857

  126. Wang, S., Lu, H., Yang, F., Yang, M.H.: Superpixel tracking. In: 2011 International Conference on Computer Vision, pp. 1323–1330. IEEE (2011)

  127. Lee, S.H., Jang, W.D., Kim, C.S.: Tracking-by-segmentation using superpixel-wise neural network. IEEE Access 6, 54982–54993 (2018)

    Article  Google Scholar 

  128. Yang, F., Lu, H., Yang, M.H.: Robust superpixel tracking. IEEE Trans. Image Process. 23(4), 1639–1651 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  129. Verelst, T., Blaschko, M., Berman, M.: Generating superpixels using deep image representations. arXiv preprint arXiv:1903.04586 (2019)

  130. Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J.: Superpixel sampling networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 352–368 (2018)

  131. Yang, F., Sun, Q., Jin, H., Zhou, Z.: Superpixel segmentation with fully convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13964–13973 (2020)

  132. Yang, X., Wei, Z., Wang, N., Song, B., Gao, X.: A novel deformable body partition model for MMW suspicious object detection and dynamic tracking. Signal Process. 174, 107627 (2020)

    Article  Google Scholar 

  133. Liu, W., Song, Y., Chen, D., He, S., Yu, Y., Yan, T., et al.: Deformable object tracking with gated fusion. IEEE Trans. Image Process. 28(8), 3766–3777 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  134. Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. Adv. Neural Inf. Process. Syst. 24, 442–450 (2011)

    Google Scholar 

  135. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)

    Article  Google Scholar 

  136. Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: European Conference on Computer Vision, pp. 836–849. Springer (2012)

  137. Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3198–3205 (2013)

  138. Nam, H., Baek, M., Han, B.: Modeling and propagating cnns in a tree structure for visual tracking. arXiv preprint arXiv:1608.07242 (2016)

  139. Wang, J., Fei, C., Zhuang, L., Yu, N.: Part-based multi-graph ranking for visual tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE; 2016. p. 1714–1718

  140. Du, D., Wen, L., Qi, H., Huang, Q., Tian, Q., Lyu, S.: Iterative graph seeking for object tracking. IEEE Trans. Image Process. 27(4), 1809–1821 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  141. Du, D., Qi, H., Li, W., Wen, L., Huang, Q., Lyu, S.: Online deformable object tracking based on structure-aware hyper-graph. IEEE Trans. Image Process. 25(8), 3572–3584 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  142. Wang, L., Lu, H., Yang, M.H.: Constrained superpixel tracking. IEEE Trans. Cybern. 48(3), 1030–1041 (2017)

    Article  Google Scholar 

  143. Jianga, B., Zhang, P., Huang, L.: Visual object tracking by segmentation with graph convolutional network. arXiv preprint arXiv:2009.02523 (2020)

  144. Parizi, S.N., Vedaldi, A., Zisserman, A., Felzenszwalb, P.: Automatic discovery and optimization of parts for image classification. arXiv preprint arXiv:1412.6598 (2014)

  145. Li, Y., Liu, L., Shen, C., Van Den Hengel, A.: Mining mid-level visual patterns with deep CNN activations. Int. J. Comput. Vis. 121(3), 344–364 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  146. Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 437–446 (2015)

  147. Sun, Y., Zheng, L., Li, Y., Yang, Y., Tian, Q., Wang, S.: Learning part-based convolutional features for person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 43, 902–917 (2019)

    Article  Google Scholar 

  148. Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., et al.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)

  149. Mordan, T., Thome, N., Henaff, G., Cord, M.: End-to-end learning of latent deformable part-based representations for object detection. Int. J. Comput. Vis. 127(11), 1659–1679 (2019)

    Article  Google Scholar 

  150. Zhang, Z., Xie, C., Wang, J., Xie, L., Yuille, A.L.: Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1372–1380 (2018)

  151. Mordan, T., Thome, N., Cord, M., Henaff, G.: Deformable part-based fully convolutional network for object detection. arXiv preprint arXiv:1707.06175 (2017)

  152. Jifeng, D., Yi, L., Kaiming, H., Jian, S.: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

  153. Ouyang, W., Zeng, X., Wang, X., Qiu, S., Luo, P., Tian, Y., et al.: DeepID-Net: object detection with deformable part based convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1320–1334 (2016)

    Article  Google Scholar 

  154. Yang, L., Xie, X., Li, P., Zhang, D., Zhang, L.: Part-based convolutional neural network for visual recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1772–1776. IEEE (2017)

  155. Wang, J., Xie, C., Zhang, Z., Zhu, J., Xie, L., Yuille, A.: Detecting semantic parts on partially occluded objects. arXiv preprint arXiv:1707.07819 (2017)

  156. Wang, J., Zhang, Z., Xie, C., Premachandran, V., Yuille, A.: Unsupervised learning of object semantic parts from internal states of cnns by population encoding. arXiv preprint arXiv:1511.06855 (2015)

  157. Li, Y., Liu, L., Shen, C., van den Hengel, A.: Mid-level deep pattern mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 971–980 (2015)

  158. Zhang, Q., Wu, Y.N., Zhu, S.C.: Interpretable convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8827–8836 (2018)

  159. Stone, A., Wang, H., Stark, M., Liu, Y., Scott Phoenix, D., George, D.: Teaching compositionality to cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5058–5067 (2017)

  160. Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2056–2063 (2013)

  161. Zhu, F., Kong, X., Zheng, L., Fu, H., Tian, Q.: Part-based deep hashing for large-scale person re-identification. IEEE Trans. Image Process. 26(10), 4806–4817 (2017)

    Article  MathSciNet  Google Scholar 

  162. Wu, G., Lu, W., Gao, G., Zhao, C., Liu, J.: Regional deep learning model for visual tracking. Neurocomputing 175, 310–323 (2016)

    Article  Google Scholar 

  163. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)

  164. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)

  165. Dinov, I.D. Black box machine-learning methods: Neural networks and support vector machines. In: Data Science and Predictive Analytics, pp. 383–422. Springer (2018)

  166. Mozhdehi, R.J., Medeiros, H.: Deep convolutional particle filter for visual tracking. In: IEEE International Conference on Image Processing (ICIP), vol. 2017, pp. 3650–3654. IEEE (2017)

  167. Yang, B., Hu, X., Wang, F.: Kernel correlation filters based on feature fusion for visual tracking. J. Phys. Conf. Ser. 1601, 052026 (2020)

    Article  Google Scholar 

  168. Yang, Y., Liao, S., Lei, Z., Li, S.: Large scale similarity learning using similar pairs for person verification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

  169. Hirzer, M., Roth, P.M., Köstinger, M., Bischof, H.: Relaxed pairwise learned metric for person re-identification. In: European Conference on Computer Vision, pp. 780–793. Springer (2012)

  170. Kulis, B., et al.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)

    Article  MATH  Google Scholar 

  171. Jia, Y., Darrell, T.: Heavy-tailed distances for gradient based image descriptors. Adv. Neural Inf. Process. Syst. 24, 397–405 (2011)

    Google Scholar 

  172. Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1573–1585 (2014)

    Article  Google Scholar 

  173. Tian, S., Shen, S., Tian, G., Liu, X., Yin, B.: End-to-end deep metric network for visual tracking. Vis. Comput. 36(6), 1219–1232 (2020)

    Article  Google Scholar 

  174. Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P, et al. Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020)

  175. Zhao, R., Ouyang, W., Wang, X.: Learning mid-level filters for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 144–151 (2014)

  176. Paisitkriangkrai, S., Shen, C., Van Den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1846–1855 (2015)

  177. Yang, W., Liu, Y., Zhang, Q., Zheng, Y.: Comparative object similarity learning-based robust visual tracking. IEEE Access 7, 50466–50475 (2019)

    Article  Google Scholar 

  178. Zhou, Y., Bai, X., Liu, W., Latecki, L.J.: Similarity fusion for visual tracking. Int. J. Comput. Vis. 118(3), 337–363 (2016)

    Article  MathSciNet  Google Scholar 

  179. Ning, J., Shi, H., Ni, J., Fu, Y.: Single-stream deep similarity learning tracking. IEEE Access 7, 127781–127787 (2019)

    Article  Google Scholar 

  180. Chicco, D.: Siamese neural networks: an overview. In: Artificial Neural Networks, pp. 73–94 (2021)

  181. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Adv. Neural Inf. Process. Syst. 6, 737–744 (1993)

  182. Vaquero, L., Brea, V.M., Mucientes, M.: Tracking more than 100 arbitrary objects at 25 FPS through deep learning. Pattern Recognit. 121, 108205 (2022)

    Article  Google Scholar 

  183. Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., et al.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2015)

    Article  Google Scholar 

  184. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer (2016)

  185. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

  186. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision, pp. 749–765. Springer (2016)

  187. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  188. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)

  189. He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)

  190. Zha, Y., Wu, M., Qiu, Z., Yu, W.: Visual tracking based on semantic and similarity learning. IET Comput. Vis. 13(7), 623–631 (2019)

    Article  Google Scholar 

  191. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)

  192. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer (2015)

  193. Liu, Y., Zhang, L., Chen, Z., Yan, Y., Wang, H.: Multi-stream siamese and faster region-based neural network for real-time object tracking. IEEE Trans. Intell. Transp. Syst. 22, 7279–7292 (2020)

    Article  Google Scholar 

  194. Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 459–474 (2018)

  195. Li, K., Kong, Y., Fu, Y.: Visual object tracking via multi-stream deep similarity learning networks. IEEE Trans. Image Process. 29, 3311–3320 (2019)

    Article  MATH  Google Scholar 

  196. Jeany, S., Mooyeol, B., Cho, M., Han, B.: Multi-Object Tracking with Quadruplet Convolutional Neural Networks. IEEE Computer Society (2017)

  197. Son, J., Baek, M., Cho, M., Han, B.: Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5620–5629 (2017)

  198. Zhang, D., Zheng, Z.: Joint representation learning with deep quadruplet network for real-time visual tracking. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

  199. Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 403–412 (2017)

  200. Dike, H.U., Zhou, Y.: A robust quadruplet and faster region-based CNN for UAV video-based multiple object tracking in crowded environment. Electronics 10(7), 795 (2021)

    Article  Google Scholar 

  201. Wu, C., Zhang, Y., Zhang, W., Wang, H., Zhang, Y., Zhang, Y., et al.: Motion guided siamese trackers for visual tracking. IEEE Access 8, 7473–7489 (2020)

    Article  Google Scholar 

  202. Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.; Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)

  203. Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on computer vision (ECCV), pp. 152–167 (2018)

  204. Shi, T., Wang, D., Ren, H.: Triplet network template for siamese trackers. IEEE Access 9, 44426–44435 (2021)

    Article  Google Scholar 

  205. Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6269–6277 (2020)

  206. Kim, M., Alletto, S., Rigazio, L.: Similarity mapping with enhanced siamese network for multi-object tracking. arXiv preprint arXiv:1609.09156 (2016)

  207. Ma, C., Yang, C., Yang, F., Zhuang, Y., Zhang, Z., Jia, H., et al.: Trajectory factory: tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)

  208. Lee, S., Kim, E.: Multiple object tracking via feature pyramid siamese networks. IEEE Access 7, 8181–8194 (2018)

    Article  Google Scholar 

  209. Liang, Y., Zhou, Y.: LSTM multiple object tracker combining multiple cues. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2351–2355. IEEE (2018)

  210. Ma, L., Tang, S., Black, M.J., Van Gool, L.: Customized multi-person tracker. In: Asian Conference on Computer Vision, pp. 612–628. Springer (2018)

  211. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. arXiv preprint arXiv:1406.6247 (2014)

  212. Jenni, S., Jin, H., Favaro, P.: Steering self-supervised feature learning beyond local pixel statistics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6408–6417 (2020)

  213. Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., Yu, N.: Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4836–4845 (2017)

  214. Fiaz, M., Mahmood, A., Baek, K.Y., Farooq, S.S., Jung, S.K.: Improving object tracking by added noise and channel attention. Sensors 20(13), 3780 (2020)

    Article  Google Scholar 

  215. Kim, C., Li, F., Rehg, J.M.: Multi-object tracking with neural gating using bilinear lstm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 200–215 (2018)

  216. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  217. Zhao, F., Zhang, T., Wu, Y., Tang, M., Wang, J.: Antidecay LSTM for siamese tracking with adversarial learning. IEEE Trans. Neural Netw. Learn. Syst. 32, 4475–4489 (2020)

    Article  Google Scholar 

  218. Chen, X., Gupta, A.: Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4086–4096 (2017)

  219. Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., et al.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2016)

    Article  Google Scholar 

  220. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  221. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)

  222. Kosiorek, A.R., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. arXiv preprint arXiv:1706.09262 (2017)

  223. Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1449–1458 (2016)

  224. Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

  225. Quan, R., Zhu, L., Wu, Y., Yang, Y.: Holistic LSTM for pedestrian trajectory prediction. IEEE Trans. Image Process. 30, 3229–3239 (2021)

    Article  Google Scholar 

  226. Shu, X., Tang, J., Qi, G., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1110–1118 (2019)

    Article  Google Scholar 

  227. Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475. IEEE (2018)

  228. Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)

  229. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  230. Stollenga, M., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. arXiv preprint arXiv:1407.3068 (2014)

  231. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning, pp. 1319–1327. PMLR (2013)

  232. Zhao M, Okada K, Inaba M. TrTr: Visual tracking with transformer. arXiv preprint arXiv:2105.03817 (2021)

  233. Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., Alameda-Pineda, X.: TransCenter: transformers with dense queries for multiple-object tracking. arXiv preprint arXiv:2103.15145 (2021)

  234. Zeng, F., Dong, B., Wang, T., Chen, C., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247 (2021)

  235. Sun, P., Jiang, Y., Zhang, R., Xie, E., Cao, J., Hu, X., et al. Transtrack: multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460. (2020)

  236. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  237. Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: multi-object tracking with transformers. arXiv preprint arXiv:2101.02702 (2021)

  238. Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10337–10346 (2020)

  239. Xiao, F., Lee, Y.J.: Spatial-temporal memory networks for video object detection. arXiv preprint arXiv:1712.06317 (2017)

  240. Deng, H., Hua, Y., Song, T., Zhang, Z., Xue, Z., Ma, R., et al.: Object guided external memory network for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6678–6687 (2019)

  241. Wang, L., Zhang, L., Wang, J., Yi, Z.: Memory mechanisms for discriminative visual tracking algorithms with deep neural networks. IEEE Transactions on Cognitive and Developmental Systems. 12(1), 98–108 (2019)

    Article  Google Scholar 

  242. Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 351–366 (2018)

  243. Xie, Y., Shen, J., Wu, C.: Affine geometrical region CNN for object tracking. IEEE Access 8, 68638–68648 (2020)

    Article  Google Scholar 

  244. Vu, H.T., Huang, C.C.: A multi-task convolutional neural network with spatial transform for parking space detection. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1762–1766. IEEE (2017)

  245. Zhou, Q., Zhong, B., Zhang, Y., Li, J., Fu, Y.: Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans. Multimed. 21(5), 1183–1194 (2018)

    Article  Google Scholar 

  246. Li, Y., Bozic, A., Zhang, T., Ji, Y., Harada, T., Nießner, M.: Learning to optimize non-rigid tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4910–4918 (2020)

  247. Li, C., Dobler, G., Feng, X., Wang, Y.: Tracknet: simultaneous object detection and tracking and its application in traffic video analysis. arXiv preprint arXiv:1902.01466 (2019)

  248. Zhu, H., Liu, H., Zhu, C., Deng, Z., Sun, X.: Learning spatial-temporal deformable networks for unconstrained face alignment and tracking in videos. Pattern Recognit. 107, 107354 (2020)

    Article  Google Scholar 

  249. Zhang, M., Wang, Q., Xing, J., Gao, J., Peng, P., Hu, W., et al.: Visual tracking via spatially aligned correlation filters network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 469–485 (2018)

  250. Zhang, X., Lei, H., Ma, Y., Luo, S., Spatial, Fan X.: Tracking, transformer part-based siamese visual. In: 39th Chinese Control Conference (CCC), vol. 2020, pp. 7269–7274. IEEE (2020)

  251. Qian, Y., Yang, M., Zhao, X., Wang, C., Wang, B.: Oriented spatial transformer network for pedestrian detection using fish-eye camera. IEEE Trans. Multimed. 22(2), 421–431 (2019)

    Article  Google Scholar 

  252. Luo, H., Jiang, W., Fan, X., Zhang, C.: Stnreid: deep convolutional networks with pairwise spatial transformer networks for partial person re-identification. IEEE Trans. Multimed. 22(11), 2905–2913 (2020)

    Article  Google Scholar 

  253. Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 384–393 (2017)

  254. Zhang, Y., Tang, Y., Fang, B., Shang, Z.: Multi-object tracking using deformable convolution networks with tracklets updating. Int. J. Wavelets Multiresolut. Inf. Process. 17(06), 1950042 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  255. Wu, H., Xu, Z., Zhang, J., Jia, G.: Offset-adjustable deformable convolution and region proposal network for visual tracking. IEEE Access 7, 85158–85168 (2019)

    Article  Google Scholar 

  256. Cao, W.M., Chen, X.J.: Deformable convolutional networks tracker. In: DEStech Transactions on Computer Science and Engineering (iteee) (2019)

  257. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. arXiv preprint arXiv:1506.02025 (2015)

  258. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

  259. Mumuni, A., Mumuni, F.: CNN architectures for geometric transformation-invariant feature representation in computer vision: a review. Manuscript accepted for publication, SN Computer Science 2(5), 1–23 (2021)

  260. Wang, X., Shrivastava, A., Gupta, A.: A-fast-rcnn: hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2606–2615 (2017)

  261. Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: St-gan: spatial transformer generative adversarial networks for image compositing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9455–9464 (2018)

  262. Zhang, D., Zheng, Z., Wang, T., He, Y.: HROM: learning high-resolution representation and object-aware masks for visual object tracking. Sensors 20(17), 4807 (2020)

    Article  Google Scholar 

  263. Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: DCCO: towards deformable continuous convolution operators for visual tracking. In: International Conference on Computer Analysis of Images and Patterns, pp. 55–67. Springer (2017)

  264. Araujo, A., Norris, W., Sim, J.: Computing receptive fields of convolutional neural networks. Distill 4(11), e21 (2019)

    Article  Google Scholar 

  265. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  266. Jiang, X., Li, P., Zhen, X., Cao, X.: Model-free tracking with deep appearance and motion features integration. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 101–110. IEEE (2019)

  267. Dequaire, J., Rao, D., Ondruska, P., Wang, D., Posner, I.: Deep tracking on the move: Learning to track the world from a moving vehicle using recurrent neural networks. arXiv preprint arXiv:1609.09365 (2016)

  268. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. (2015)

  269. Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)

  270. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision (WACV), vol. 2018, pp. 1451–1460. IEEE (2018)

  271. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. arXiv preprint arXiv:2006.10721 (2020)

  272. Weng, X., Wu, S., Beainy, F., Kitani, K.M.: Rotational rectification network: enabling pedestrian detection for mobile vision. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1084–1092. IEEE (2018)

  273. Marcos, D., Volpi, M., Tuia, D.: Learning rotation invariant convolutional filters for texture classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2012–2017. IEEE (2016)

  274. Jacobsen, J.H., De Brabandere, B., Smeulders, A.W.: Dynamic steerable blocks in deep residual networks. arXiv preprint arXiv:1706.00598 (2017)

  275. Tarasiuk, P., Pryczek, M.: Geometric transformations embedded into convolutional neural networks. J. Appl. Comput. Sci. 24(3), 33–48 (2016)

    Google Scholar 

  276. Henriques, J.F., Vedaldi, A.: Warped convolutions: Efficient invariance to spatial transformations. In: International conference on machine learning, pp. 1461–1469. PMLR (2017)

  277. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  278. Yang, L., Han, Y., Chen, X., Song, S., Dai, J., Huang, G.: Resolution adaptive networks for efficient inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2369–2378 (2020)

  279. Tamura, M., Horiguchi, S., Murakami, T.: Omnidirectional pedestrian detection by rotation invariant training. In: IEEE winter conference on Applications of Computer Vision (WACV), vol. 2019, pp. 1989–1998. IEEE (2019)

  280. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  281. Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)

  282. Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., et al.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2272–2280 (2021)

  283. Hao, Z., Liu, Y., Qin, H., Yan, J., Li, X., Hu, X.: Scale-aware face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6186–6195 (2017)

  284. Yang, Z., Xu, Y., Dai, W., Xiong, H.: Dynamic-stride-net: deep convolutional neural network with dynamic stride. In: Optoelectronic Imaging and Multimedia Technology VI, vol. 11187, p. 1118707. International Society for Optics and Photonics (2019)

  285. Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M.C., Qi, H., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)

    Article  Google Scholar 

  286. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  287. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., et al.: The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1–23 (2015)

  288. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., et al.: The visual object tracking VOT2016 challenge results. In: Computer Vision—ECCV 2016 Workshops, pp. 777–823 (2016)

  289. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., et al.: The visual object tracking vot2017 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1949–1972 (2017)

  290. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)

  291. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)

  292. Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)

  293. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  294. Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1125–1134 (2017)

  295. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, pp. 445–461. Springer (2016)

  296. Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  297. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intelligence. 43, 1562–1577 (2019)

    Article  Google Scholar 

  298. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., et al. Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

  299. Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–317 (2018)

  300. Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 370–386 (2018)

  301. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  302. Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5296–5305 (2017)

  303. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  304. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

  305. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

  306. Yang, T., Chan, A.B.: Visual tracking via dynamic memory networks. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 360–374 (2019)

    Google Scholar 

  307. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)

  308. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)

  309. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6182–6191 (2019)

  310. Lukezic, A., Matas, J., Kristan, M.: D3S-A discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7133–7142 (2020)

  311. Xie, F., Yang, W., Zhang, K., Liu, B., Wang, G., Zuo, W.: Learning spatio-appearance memory network for high-performance visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2678–2687 (2021)

  312. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129, 3069–3087 (2021)

    Article  Google Scholar 

  313. Zheng, L., Tang, M., Chen, Y., Zhu, G., Wang, J., Lu, H.: Improving multiple object tracking with single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2453–2462 (2021)

  314. Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.H.: Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 366–382 (2018)

  315. Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257 (2020)

  316. Saleh, F., Aliakbarian, S., Rezatofighi, H., Salzmann, M., Gould, S.: Probabilistic tracklet scoring and inpainting for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14329–14339 (2021)

  317. Zhang, Y., Sun, P., Jiang, Y., Yu, D., Yuan, Z., Luo, P., et al.: ByteTrack: multi-object tracking by associating every detection box. arXiv preprint arXiv:2110.06864 (2021)

  318. Wang, Q., Zheng, Y., Pan, P., Xu, Y.: Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3876–3886 (2021)

  319. Liang, C., Zhang, Z., Zhou, X., Li, B., Lu, Y., Hu, W.: One more check: making “fake background” be tracked again. arXiv preprint arXiv:2104.09441 (2021)

  320. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 2008, 1–10 (2008)

    Article  Google Scholar 

  321. Wu, S., Xu, Y.: DSN: a new deformable subnetwork for object detection. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2057–2066 (2019)

    Google Scholar 

  322. Liu, Y., Duanmu, M., Huo, Z., Qi, H., Chen, Z., Li, L., et al.: Exploring multi-scale deformable context and channel-wise attention for salient object detection. Neurocomputing 428, 92–103 (2021)

    Article  Google Scholar 

  323. Lee, H., Choi, S., Kim, C.: A memory model based on the siamese network for long-term tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

  324. Fiaz, M., Mahmood, A., Jung, S.K.: Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking. Sensors 20(14), 4021 (2020)

  325. Lee, D.J.L., Macke, S., Xin, D., Lee, A., Huang, S., Parameswaran, A.G.: A Human-in-the-loop Perspective on AutoML: milestones and the road ahead. IEEE Data Eng Bull. 42(2), 59–70 (2019)

    Google Scholar 

  326. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

  327. Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., Xing, E.: Neural architecture search with bayesian optimisation and optimal transport. arXiv preprint arXiv:1802.07191 (2018)

  328. Lu, Z., Whalen, I., Boddeti, V., Dhebar, Y., Deb, K., Goodman, E., et al.: Nsga-net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 419–427 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alhassan Mumuni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mumuni, A., Mumuni, F. Robust appearance modeling for object detection and tracking: a survey of deep learning approaches. Prog Artif Intell 11, 279–313 (2022). https://doi.org/10.1007/s13748-022-00290-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-022-00290-6

Keywords

Navigation