Multi-modality Fusion for Siamese Network Based RGB-T Tracking (mfSiamTrack)

Chithra, A. V.; Mishra, Deepak

doi:10.1007/978-3-031-31407-0_31

Multi-modality Fusion for Siamese Network Based RGB-T Tracking (mfSiamTrack)

A. V. Chithra¹⁰ &
Deepak Mishra¹⁰

Conference paper
First Online: 07 May 2023

381 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1776))

Abstract

Object tracking in thermal imagery is a challenging problem relevant to more and more growing applications. By fusing the complementary features of RGB and thermal images, the object tracking algorithms can be enhanced to give better outputs. mfSiamTrack (Multi-modality fusion for Siamese Network based RGB-T Tracking) is a dual-mode STSO (Short Term Single Object) tracker. The tracker works in Thermal Mode and Multi-modality fusion mode (RGBT mode). The RGBT mode gets activated if the dataset contains the Thermal Infrared and the corresponding RGB sequences. The complementary features from both RGB and Thermal Imagery are fused, and tracking uses fused sequences. If only thermal sequences exist in the dataset, the tracker works in thermal tracking mode. An auto-encoder (AE) based fusion network is proposed for multi-modality fusion. The Encoder decomposes the RGB and thermal images into the background and detail feature maps. The background and detail feature maps of the source images are fused by the Fusion Layer and the Decoder reconstructs the fused image. For handling objects at different scales and viewpoints, mfSiamTrack introduces a Multi-Scale Structural Similarity (MS-SSIM) based reconstruction method. mfSiamTrack is a fully convolutional-siamese network based tracker which also incorporates a semi-supervised video object segmentation (VOS) for pixel-wise target identification. The tracker was evaluated on the VOT-RGBT2019 dataset with Accuracy, Robustness, Expected Average Overlap (EAO) and Average IoU as performance evaluation measures. It is observed that mfSiamTrack outperforms the state-of-the-art.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Berg, A., Ahlberg, J., Felsberg, M.: A thermal object tracking benchmark. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2015)
Google Scholar
Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 221–230 (2017)
Google Scholar
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1189–1198 (2018)
Google Scholar
Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_31
Chapter Google Scholar
Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1400–1409 (2018)
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6 (1993)
Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)
Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Google Scholar
Li, H., Wu, X.-J., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2705–2710. IEEE (2018)
Google Scholar
Li, H., Wu, X.-J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)
Article MathSciNet Google Scholar
Li, H., Wu, X.-J., Durrani, T.S.: Infrared and visible image fusion with ResNet and zero-phase component analysis. Infrared Phys. Technol. 102, 103039 (2019)
Article Google Scholar
Zhao, Z., Xu, S., Zhang, C., Liu, J., Li, P., Zhang, J.: DIDFuse: deep image decomposition for infrared and visible image fusion. arXiv preprint arXiv:2003.09210 (2020)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Chapter Google Scholar
Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Zhang, L., Danelljan, M., Gonzalez-Garcia, A., van de Weijer, J., Shahbaz Khan, F.: Multi-modal fusion for end-to-end RGB-T tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
Google Scholar
Li, C.L., Lu, A., Zheng, A.H., Tu, Z., Tang, J.: Multi-adapter RGBT tracking. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 2262–2270. IEEE (2019)
Google Scholar
Zhang, P., Zhao, J., Bo, C., Wang, D., Lu, H., Yang, X.: Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans. Image Process. 30, 3335–3347 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Space Science and Technologies, Thiruvananthapuram, Kerala, India
A. V. Chithra & Deepak Mishra

Authors

A. V. Chithra
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. V. Chithra .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chithra, A.V., Mishra, D. (2023). Multi-modality Fusion for Siamese Network Based RGB-T Tracking (mfSiamTrack). In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1776. Springer, Cham. https://doi.org/10.1007/978-3-031-31407-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-31407-0_31
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31406-3
Online ISBN: 978-3-031-31407-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics