Skip to main content

Multi-modality Fusion for Siamese Network Based RGB-T Tracking (mfSiamTrack)

  • Conference paper
  • First Online:
  • 381 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1776))

Abstract

Object tracking in thermal imagery is a challenging problem relevant to more and more growing applications. By fusing the complementary features of RGB and thermal images, the object tracking algorithms can be enhanced to give better outputs. mfSiamTrack (Multi-modality fusion for Siamese Network based RGB-T Tracking) is a dual-mode STSO (Short Term Single Object) tracker. The tracker works in Thermal Mode and Multi-modality fusion mode (RGBT mode). The RGBT mode gets activated if the dataset contains the Thermal Infrared and the corresponding RGB sequences. The complementary features from both RGB and Thermal Imagery are fused, and tracking uses fused sequences. If only thermal sequences exist in the dataset, the tracker works in thermal tracking mode. An auto-encoder (AE) based fusion network is proposed for multi-modality fusion. The Encoder decomposes the RGB and thermal images into the background and detail feature maps. The background and detail feature maps of the source images are fused by the Fusion Layer and the Decoder reconstructs the fused image. For handling objects at different scales and viewpoints, mfSiamTrack introduces a Multi-Scale Structural Similarity (MS-SSIM) based reconstruction method. mfSiamTrack is a fully convolutional-siamese network based tracker which also incorporates a semi-supervised video object segmentation (VOS) for pixel-wise target identification. The tracker was evaluated on the VOT-RGBT2019 dataset with Accuracy, Robustness, Expected Average Overlap (EAO) and Average IoU as performance evaluation measures. It is observed that mfSiamTrack outperforms the state-of-the-art.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Berg, A., Ahlberg, J., Felsberg, M.: A thermal object tracking benchmark. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2015)

    Google Scholar 

  2. Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 221–230 (2017)

    Google Scholar 

  3. Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1189–1198 (2018)

    Google Scholar 

  4. Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_31

    Chapter  Google Scholar 

  5. Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1400–1409 (2018)

    Google Scholar 

  6. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6 (1993)

    Google Scholar 

  7. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7

    Chapter  Google Scholar 

  8. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)

    Google Scholar 

  9. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  10. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

    Google Scholar 

  11. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7

    Chapter  Google Scholar 

  12. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)

    Google Scholar 

  15. Li, H., Wu, X.-J., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2705–2710. IEEE (2018)

    Google Scholar 

  16. Li, H., Wu, X.-J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)

    Article  MathSciNet  Google Scholar 

  17. Li, H., Wu, X.-J., Durrani, T.S.: Infrared and visible image fusion with ResNet and zero-phase component analysis. Infrared Phys. Technol. 102, 103039 (2019)

    Article  Google Scholar 

  18. Zhao, Z., Xu, S., Zhang, C., Liu, J., Li, P., Zhang, J.: DIDFuse: deep image decomposition for infrared and visible image fusion. arXiv preprint arXiv:2003.09210 (2020)

  19. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  20. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers 2003, vol. 2, pp. 1398–1402. IEEE (2003)

    Google Scholar 

  21. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  22. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  23. Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36

    Chapter  Google Scholar 

  24. Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  25. Zhang, L., Danelljan, M., Gonzalez-Garcia, A., van de Weijer, J., Shahbaz Khan, F.: Multi-modal fusion for end-to-end RGB-T tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  26. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

    Google Scholar 

  27. Li, C.L., Lu, A., Zheng, A.H., Tu, Z., Tang, J.: Multi-adapter RGBT tracking. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 2262–2270. IEEE (2019)

    Google Scholar 

  28. Zhang, P., Zhao, J., Bo, C., Wang, D., Lu, H., Yang, X.: Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans. Image Process. 30, 3335–3347 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Chithra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chithra, A.V., Mishra, D. (2023). Multi-modality Fusion for Siamese Network Based RGB-T Tracking (mfSiamTrack). In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1776. Springer, Cham. https://doi.org/10.1007/978-3-031-31407-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31407-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31406-3

  • Online ISBN: 978-3-031-31407-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics