research-article

Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection

Authors:

Pong C. YuenAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 2

Article No.: 57, Pages 1 - 16

https://doi.org/10.1145/3430257

Published: 29 May 2021 Publication History

Abstract

The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.

References

[1]

Shai Avidan. 2004. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell. 26, 8 (2004), 1064–1072.

Digital Library

[2]

Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2011. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1619–1632.

Digital Library

[3]

David S. Bolme, J. Ross Beveridge, Bruce Draper, Yui Man Lui et al. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the CVPR. 2544–2550.

[4]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1 (2011).

Digital Library

[5]

Filiz Bunyak, Kannappan Palaniappan, Sumit Kumar Nath, and Guna Seetharaman. 2007. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In Proceedings of the WACV.

Digital Library

[6]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of the CVPR. 6931–6939.

[7]

Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the ICCV. 4310–4318.

Digital Library

[8]

Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the CVPR. 1430–1438.

[9]

Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, and Joost van de Weijer. 2014. Adaptive color attributes for real-time visual tracking. In Proceedings of the CVPR. IEEE, 1090–1097.

Digital Library

[10]

Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the ECCV. 472–488.

[11]

Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey. 2013. Multi-channel correlation filters. In Proceedings of the ICCV. 3072–3079.

Digital Library

[12]

Helmut Grabner and Horst Bischof. 2006. On-line boosting and vision. In Proceedings of the CVPR. 260–267.

Digital Library

[13]

Helmut Grabner, Christian Leistner, and Horst Bischof. 2008. Semi-supervised on-line boosting for robust tracking. In Proceedings of the ECCV. 234–247.

Digital Library

[14]

Jungong Han, Eric J. Pauwels, Paul M. de Zeeuw, and Peter H. N. de With. 2012. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58, 2 (2012), 255–263.

[15]

Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip H. S. Torr. 2016. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 2096–2109.

Digital Library

[16]

João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2012. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the ECCV. 702–715.

Digital Library

[17]

João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583–596.

Digital Library

[18]

Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015. MUlti-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of the CVPR. 749–758.

[19]

Yang Hua, Karteek Alahari, and Cordelia Schmid. 2014. Occlusion and motion reasoning for long-term tracking. In Proceedings of the ECCV. 172–187.

[20]

Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409–1422.

Digital Library

[21]

Xiangyuan Lan, A. J. Ma, P. C. Yuen, and R. Chellappa. 2015. Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans. Image Process. 24, 12 (Dec 2015), 5826–5841.

Digital Library

[22]

Xiangyuan Lan, Andy Jinhua Ma, and Pong Chi Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the CVPR. 1194–1201.

Digital Library

[23]

Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, and Huiyu Zhou. 2019. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Ind. Electron. 66, 12 (2019), 9887–9897.

[24]

Xiangyuan Lan, Mang Ye, Shengping Zhang, and Pong C. Yuen. 2018. Robust collaborative discriminative learning for RGB-infrared tracking. In Proceedings of the AAAI. 7008–7015.

[25]

Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2018. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recog. Lett. (2018).

Digital Library

[26]

Xiangyuan Lan, Pong C. Yuen, and Rama Chellappa. 2017. Robust MIL-based feature template learning for object tracking. In Proceedings of the AAAI. 4118–4125.

Digital Library

[27]

Xiangyuan Lan, Shengping Zhang, and Pong C. Yuen. 2016. Robust joint discriminative feature learning for visual tracking. In Proceedings of the IJCAI. 3403–3410.

Digital Library

[28]

Xiangyuan Lan, Shengping Zhang, Pong C. Yuen, and Rama Chellappa. 2018. Learning common and feature-specific patterns: A novel multiple-sparse-representation-based tracker. IEEE Trans. Image Process. 27, 4 (2018), 2022–2037.

[29]

Xiangyuan Lan, Wei Zhang, Shengping Zhang, Deepak Kumar Jain, and Huiyu Zhou. 2019. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Trans. Industr. Inform. (2019).

[30]

Karel Lebeda, Simon Hadfield, Jiri Matas, and Richard Bowden. 2013. Long-term tracking through failure cases. In Proceedings of the ICCV Workshop. 153–160.

Digital Library

[31]

Alex Leykin and Riad I. Hammoud. 2010. Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach. Vis. Appl. 21, 4 (2010), 587–595.

Digital Library

[32]

Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 12 (2016), 5743–5756.

Digital Library

[33]

Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the CVPR. 4904–4913.

[34]

Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton Van Den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4, 4 (2013), 58.

Digital Library

[35]

Yang Li and Jianke Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the ECCV Workshops. 254–265.

[36]

Yang Li, Jianke Zhu, and Steven C. H. Hoi. 2015. Reliable patch trackers: Robust visual tracking by exploiting reliable patches. In Proceedings of the CVPR. 353–361.

[37]

HuaPing Liu and FuChun Sun. 2012. Fusion tracking in color and infrared images using joint sparse representation. Sci. China Inf. Sci. 55, 3 (2012), 590–599.

[38]

Si Liu, Tianzhu Zhang, Xiaochun Cao, and Changsheng Xu. 2016. Structural correlation filter for robust visual tracking. In Proceedings of the CVPR. 4312–4320.

[39]

Ting Liu, Gang Wang, and Qingxiong Yang. 2015. Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of the CVPR. 4902–4912.

[40]

Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the ICCV. 3074–3082.

Digital Library

[41]

Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015. Long-term correlation tracking. In Proceedings of the CVPR. 5388–5396.

[42]

Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI. 1881–1887.

Digital Library

[43]

Yuankai Qi, Shengping Zhang, Lei Qin, Qingming Huang, Hongxun Yao, Jongwoo Lim, and Ming-Hsuan Yang. 2018. Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (2018).

[44]

Samuele Salti, Andrea Cavallaro, and Luigi di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Trans. Image Process. 21, 10 (2012), 4334–4348.

Digital Library

[45]

Rui Shao, Xiangyuan Lan, and P. C. Yuen. 2018. Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Trans. Inf. Forens. Secur. (2018).

[46]

Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, and Ming-Hsuan Yang. 2017. Stylizing face images via multiple exemplars. Comput. Vis. Image Underst. 162 (2017), 135–145.

Digital Library

[47]

James S. Supancic and Deva Ramanan. 2013. Self-paced learning for long-term tracking. In Proceedings of the CVPR. 2379–2386.

Digital Library

[48]

Ming Tang and Jiayi Feng. 2015. Multi-kernel correlation filter for visual tracking. In Proceedings of the ICCV. 3038–3046.

Digital Library

[49]

Yi Wu, Erik Blasch, Genshe Chen, Li Bai, and Haibin Ling. 2011. Multiple source data fusion via sparse representation for robust visual tracking. In Proceedings of the FUSION Conference. 1–8.

[50]

Mang Ye, Yi Cheng, Xiangyuan Lan, and Hongyuan Zhu. 2020. Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE Trans. Ind. Inform. 16, 1 (2020), 615–624.

[51]

Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. 29 (2020), 9387–9399.

[52]

Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C. Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI.

[53]

Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans. Inf. Forens. Secur. 15 (2020), 407–419.

Digital Library

[54]

Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C. Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking. In Proceedings of the IJCAI. 1092–1099.

Digital Library

[55]

Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the ECCV. 188–203.

[56]

Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan Yang. 2014. Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the ECCV. 127–141.

[57]

Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.

[58]

Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.

[59]

Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Trans. Circ. Syst. Vid. Technol. 27, 3 (2017), 421–430.

Digital Library

[60]

Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Hhuiyu Zhou, Dacheng Tao, and Xxuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2017), 2357–2370.

[61]

Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recog. 46, 7 (2013), 1772–1788.

Digital Library

[62]

Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Baochang Zhang, and Rongrong Ji. 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans. Image Process. 29 (2020), 7578–7589.

Digital Library

Cited By

Liu ZYang YWu KLiu QXu XMa XTang J(2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366589320:9(1-23)Online publication date: 23-May-2024
https://dl.acm.org/doi/10.1145/3665893
Xiao YWu Y(2024)Robust visual tracking via modified Harris hawks optimizationImage and Vision Computing10.1016/j.imavis.2024.104959144(104959)Online publication date: Apr-2024
https://doi.org/10.1016/j.imavis.2024.104959
Xiao YWu Y(2024)A dual-channel correlation filtering tracker for real-time tracking based on deep features of improved CaffeNet and integrated manual featuresThe Visual Computer10.1007/s00371-024-03664-0Online publication date: 27-Sep-2024
https://doi.org/10.1007/s00371-024-03664-0
Show More Cited By

Index Terms

Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Hand-Eye Camera Calibration with an Optical Tracking System
ICDSC '18: Proceedings of the 12th International Conference on Distributed Smart Cameras

This paper presents a method for hand-eye camera calibration via an optical tracking system (OTS) faciltating robotic applications. The camera pose cannot be directly tracked via the OTS. Because of this, a transformation matrix between a marker-plate ...
Reducing drift in differential tracking

We present methods for turning pair-wise registration algorithms into drift-free trackers. Such registration algorithms are abundant, but the simplest techniques for building trackers on top of them exhibit either limited tracking range or drift. Our ...
Triple-attention interaction network for breast tumor classification based on multi-modality images
Highlights
- A novel triple-attention interaction network is proposed for breast tumor classification with ADC and DWI images.
- A novel triple inter-modality interaction mechanism is proposed to leverage correlation, complementary, and ...
Graphical abstract

Display Omitted

Abstract
Breast cancer can be diagnosed using medical imaging. Classification performance of medical imaging can be improved by multi-modality image fusion. However, existing fusion algorithm fail to consider the importance of modality interactions and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 2

May 2021

410 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3461621

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2021

Accepted: 01 October 2020

Revised: 01 October 2019

Received: 01 May 2019

Published in TOMM Volume 17, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Hong Kong Research Grant Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu ZYang YWu KLiu QXu XMa XTang J(2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366589320:9(1-23)Online publication date: 23-May-2024
https://dl.acm.org/doi/10.1145/3665893
Xiao YWu Y(2024)Robust visual tracking via modified Harris hawks optimizationImage and Vision Computing10.1016/j.imavis.2024.104959144(104959)Online publication date: Apr-2024
https://doi.org/10.1016/j.imavis.2024.104959
Xiao YWu Y(2024)A dual-channel correlation filtering tracker for real-time tracking based on deep features of improved CaffeNet and integrated manual featuresThe Visual Computer10.1007/s00371-024-03664-0Online publication date: 27-Sep-2024
https://doi.org/10.1007/s00371-024-03664-0
Wang SQian KShen JMa HChen P(2023)AD-SiamRPN: Anti-Deformation Object Tracking via an Improved Siamese Region Proposal Network on Hyperspectral VideosRemote Sensing10.3390/rs1507173115:7(1731)Online publication date: 23-Mar-2023
https://doi.org/10.3390/rs15071731
Liu PLi GZhao WTang X(2023)Channel-Weighted Structured Correlation Filters for UAV TrackingIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2023.330565120(1-5)Online publication date: 2023
https://doi.org/10.1109/LGRS.2023.3305651
Qian KChen PZhao D(2023)GOMT: Multispectral video tracking based on genetic optimization and multi‐features integrationIET Image Processing10.1049/ipr2.1273917:5(1578-1589)Online publication date: 11-Jan-2023
https://doi.org/10.1049/ipr2.12739
Qian KShen JWang SSun W(2023)Recent advances in object tracking using hyperspectral videos: a surveyMultimedia Tools and Applications10.1007/s11042-023-17758-983:18(56155-56181)Online publication date: 11-Dec-2023
https://doi.org/10.1007/s11042-023-17758-9
Wang SQian KChen P(2022)BS-SiamRPN: Hyperspectral Video Tracking based on Band Selection and the Siamese Region Proposal Network2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)10.1109/WHISPERS56178.2022.9955025(1-8)Online publication date: 13-Sep-2022
https://doi.org/10.1109/WHISPERS56178.2022.9955025
Qian KRong SCheng K(2021)Anti-interference small target tracking from infrared dual waveband imageryInfrared Physics & Technology10.1016/j.infrared.2021.103882118(103882)Online publication date: Nov-2021
https://doi.org/10.1016/j.infrared.2021.103882

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents