skip to main content
research-article

Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection

Published: 29 May 2021 Publication History

Abstract

The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.

References

[1]
Shai Avidan. 2004. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell. 26, 8 (2004), 1064–1072.
[2]
Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2011. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1619–1632.
[3]
David S. Bolme, J. Ross Beveridge, Bruce Draper, Yui Man Lui et al. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the CVPR. 2544–2550.
[4]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1 (2011).
[5]
Filiz Bunyak, Kannappan Palaniappan, Sumit Kumar Nath, and Guna Seetharaman. 2007. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In Proceedings of the WACV.
[6]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of the CVPR. 6931–6939.
[7]
Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the ICCV. 4310–4318.
[8]
Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the CVPR. 1430–1438.
[9]
Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, and Joost van de Weijer. 2014. Adaptive color attributes for real-time visual tracking. In Proceedings of the CVPR. IEEE, 1090–1097.
[10]
Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the ECCV. 472–488.
[11]
Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey. 2013. Multi-channel correlation filters. In Proceedings of the ICCV. 3072–3079.
[12]
Helmut Grabner and Horst Bischof. 2006. On-line boosting and vision. In Proceedings of the CVPR. 260–267.
[13]
Helmut Grabner, Christian Leistner, and Horst Bischof. 2008. Semi-supervised on-line boosting for robust tracking. In Proceedings of the ECCV. 234–247.
[14]
Jungong Han, Eric J. Pauwels, Paul M. de Zeeuw, and Peter H. N. de With. 2012. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58, 2 (2012), 255–263.
[15]
Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip H. S. Torr. 2016. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 2096–2109.
[16]
João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2012. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the ECCV. 702–715.
[17]
João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583–596.
[18]
Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015. MUlti-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of the CVPR. 749–758.
[19]
Yang Hua, Karteek Alahari, and Cordelia Schmid. 2014. Occlusion and motion reasoning for long-term tracking. In Proceedings of the ECCV. 172–187.
[20]
Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409–1422.
[21]
Xiangyuan Lan, A. J. Ma, P. C. Yuen, and R. Chellappa. 2015. Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans. Image Process. 24, 12 (Dec 2015), 5826–5841.
[22]
Xiangyuan Lan, Andy Jinhua Ma, and Pong Chi Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the CVPR. 1194–1201.
[23]
Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, and Huiyu Zhou. 2019. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Ind. Electron. 66, 12 (2019), 9887–9897.
[24]
Xiangyuan Lan, Mang Ye, Shengping Zhang, and Pong C. Yuen. 2018. Robust collaborative discriminative learning for RGB-infrared tracking. In Proceedings of the AAAI. 7008–7015.
[25]
Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2018. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recog. Lett. (2018).
[26]
Xiangyuan Lan, Pong C. Yuen, and Rama Chellappa. 2017. Robust MIL-based feature template learning for object tracking. In Proceedings of the AAAI. 4118–4125.
[27]
Xiangyuan Lan, Shengping Zhang, and Pong C. Yuen. 2016. Robust joint discriminative feature learning for visual tracking. In Proceedings of the IJCAI. 3403–3410.
[28]
Xiangyuan Lan, Shengping Zhang, Pong C. Yuen, and Rama Chellappa. 2018. Learning common and feature-specific patterns: A novel multiple-sparse-representation-based tracker. IEEE Trans. Image Process. 27, 4 (2018), 2022–2037.
[29]
Xiangyuan Lan, Wei Zhang, Shengping Zhang, Deepak Kumar Jain, and Huiyu Zhou. 2019. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Trans. Industr. Inform. (2019).
[30]
Karel Lebeda, Simon Hadfield, Jiri Matas, and Richard Bowden. 2013. Long-term tracking through failure cases. In Proceedings of the ICCV Workshop. 153–160.
[31]
Alex Leykin and Riad I. Hammoud. 2010. Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach. Vis. Appl. 21, 4 (2010), 587–595.
[32]
Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 12 (2016), 5743–5756.
[33]
Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the CVPR. 4904–4913.
[34]
Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton Van Den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4, 4 (2013), 58.
[35]
Yang Li and Jianke Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the ECCV Workshops. 254–265.
[36]
Yang Li, Jianke Zhu, and Steven C. H. Hoi. 2015. Reliable patch trackers: Robust visual tracking by exploiting reliable patches. In Proceedings of the CVPR. 353–361.
[37]
HuaPing Liu and FuChun Sun. 2012. Fusion tracking in color and infrared images using joint sparse representation. Sci. China Inf. Sci. 55, 3 (2012), 590–599.
[38]
Si Liu, Tianzhu Zhang, Xiaochun Cao, and Changsheng Xu. 2016. Structural correlation filter for robust visual tracking. In Proceedings of the CVPR. 4312–4320.
[39]
Ting Liu, Gang Wang, and Qingxiong Yang. 2015. Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of the CVPR. 4902–4912.
[40]
Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the ICCV. 3074–3082.
[41]
Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015. Long-term correlation tracking. In Proceedings of the CVPR. 5388–5396.
[42]
Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI. 1881–1887.
[43]
Yuankai Qi, Shengping Zhang, Lei Qin, Qingming Huang, Hongxun Yao, Jongwoo Lim, and Ming-Hsuan Yang. 2018. Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (2018).
[44]
Samuele Salti, Andrea Cavallaro, and Luigi di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Trans. Image Process. 21, 10 (2012), 4334–4348.
[45]
Rui Shao, Xiangyuan Lan, and P. C. Yuen. 2018. Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Trans. Inf. Forens. Secur. (2018).
[46]
Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, and Ming-Hsuan Yang. 2017. Stylizing face images via multiple exemplars. Comput. Vis. Image Underst. 162 (2017), 135–145.
[47]
James S. Supancic and Deva Ramanan. 2013. Self-paced learning for long-term tracking. In Proceedings of the CVPR. 2379–2386.
[48]
Ming Tang and Jiayi Feng. 2015. Multi-kernel correlation filter for visual tracking. In Proceedings of the ICCV. 3038–3046.
[49]
Yi Wu, Erik Blasch, Genshe Chen, Li Bai, and Haibin Ling. 2011. Multiple source data fusion via sparse representation for robust visual tracking. In Proceedings of the FUSION Conference. 1–8.
[50]
Mang Ye, Yi Cheng, Xiangyuan Lan, and Hongyuan Zhu. 2020. Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE Trans. Ind. Inform. 16, 1 (2020), 615–624.
[51]
Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. 29 (2020), 9387–9399.
[52]
Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C. Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI.
[53]
Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans. Inf. Forens. Secur. 15 (2020), 407–419.
[54]
Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C. Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking. In Proceedings of the IJCAI. 1092–1099.
[55]
Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the ECCV. 188–203.
[56]
Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan Yang. 2014. Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the ECCV. 127–141.
[57]
Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.
[58]
Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.
[59]
Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Trans. Circ. Syst. Vid. Technol. 27, 3 (2017), 421–430.
[60]
Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Hhuiyu Zhou, Dacheng Tao, and Xxuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2017), 2357–2370.
[61]
Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recog. 46, 7 (2013), 1772–1788.
[62]
Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Baochang Zhang, and Rongrong Ji. 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans. Image Process. 29 (2020), 7578–7589.

Cited By

View all
  • (2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366589320:9(1-23)Online publication date: 23-May-2024
  • (2024)Robust visual tracking via modified Harris hawks optimizationImage and Vision Computing10.1016/j.imavis.2024.104959144(104959)Online publication date: Apr-2024
  • (2024)A dual-channel correlation filtering tracker for real-time tracking based on deep features of improved CaffeNet and integrated manual featuresThe Visual Computer10.1007/s00371-024-03664-0Online publication date: 27-Sep-2024
  • Show More Cited By

Index Terms

  1. Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
    May 2021
    410 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3461621
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 May 2021
    Accepted: 01 October 2020
    Revised: 01 October 2019
    Received: 01 May 2019
    Published in TOMM Volume 17, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-modality fusion
    2. behavior understanding
    3. tracking

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Hong Kong Research Grant Council

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366589320:9(1-23)Online publication date: 23-May-2024
    • (2024)Robust visual tracking via modified Harris hawks optimizationImage and Vision Computing10.1016/j.imavis.2024.104959144(104959)Online publication date: Apr-2024
    • (2024)A dual-channel correlation filtering tracker for real-time tracking based on deep features of improved CaffeNet and integrated manual featuresThe Visual Computer10.1007/s00371-024-03664-0Online publication date: 27-Sep-2024
    • (2023)AD-SiamRPN: Anti-Deformation Object Tracking via an Improved Siamese Region Proposal Network on Hyperspectral VideosRemote Sensing10.3390/rs1507173115:7(1731)Online publication date: 23-Mar-2023
    • (2023)Channel-Weighted Structured Correlation Filters for UAV TrackingIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2023.330565120(1-5)Online publication date: 2023
    • (2023)GOMT: Multispectral video tracking based on genetic optimization and multi‐features integrationIET Image Processing10.1049/ipr2.1273917:5(1578-1589)Online publication date: 11-Jan-2023
    • (2023)Recent advances in object tracking using hyperspectral videos: a surveyMultimedia Tools and Applications10.1007/s11042-023-17758-983:18(56155-56181)Online publication date: 11-Dec-2023
    • (2022)BS-SiamRPN: Hyperspectral Video Tracking based on Band Selection and the Siamese Region Proposal Network2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)10.1109/WHISPERS56178.2022.9955025(1-8)Online publication date: 13-Sep-2022
    • (2021)Anti-interference small target tracking from infrared dual waveband imageryInfrared Physics & Technology10.1016/j.infrared.2021.103882118(103882)Online publication date: Nov-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media