IFMOT: interactive perception and feature optimization network for multi-object tracking

Cao, Dongliang; Ren, Wang; Yu, Changhong; Wu, Bin

doi:10.1007/s00530-025-01694-9

IFMOT: interactive perception and feature optimization network for multi-object tracking

Regular Paper
Published: 05 March 2025

Volume 31, article number 136, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Dongliang Cao¹,
Wang Ren¹^na1,
Changhong Yu¹^na1 &
…
Bin Wu¹^na1

67 Accesses
Explore all metrics

Abstract

Multi-object tracking requires accurately identifying and tracking multiple targets over long periods. However, tracking performance is highly susceptible to various factors, such as target deformation, occlusion, etc. Meanwhile, most existing MOT models perform simple aggregation and classification of target features, ignoring the inherent differences and connections between detection and re-identification. This often leads to frequent identity switches. To address the above issues, we propose our tracker IFMOT, a simple and efficient network that combines an interactive perception network with feature optimization. Specifically, we propose an interactive perception network with a multi-head cross-attention mechanism design to alleviate feature conflicts. And then, we introduce a feature optimization module that refines the target representation to improve the extraction capability of feature embeddings. Furthermore, a feature integration similarity matrix is used to comprehensively assess the similarity between objects and handle unreliable similarity matching. Experiments on the MOT16, MOT17, MOT20 and Dancetrack datasets show that the proposed method achieves a higher accuracy while keeping the tracking speed, in contrast to other state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Temporal-Aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking

Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking

Article 22 July 2023

Multi-object tracking with scale-aware transformer and enhanced association strategy

Article 19 February 2025

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Moorthy, S., Joo, Y.H.: Adaptive spatial-temporal surrounding-aware correlation filter tracking via ensemble learning. Pattern Recogn. 139, 109457 (2023)
Article Google Scholar
Moorthy, S., Choi, J.Y., Joo, Y.H.: Gaussian-response correlation filter for robust visual object tracking. Neurocomputing 411, 78–90 (2020)
Article MATH Google Scholar
Kuppusami Sakthivel, S.S., Moorthy, S., Arthanari, S., Jeong, J.H., Joo, Y.H.: Learning a context-aware environmental residual correlation filter via deep convolution features for visual object tracking. Mathematics 12(14), 2279 (2024)
Article Google Scholar
Moorthy, S., KS, S.S., Arthanari, S., Jeong, J.H., Joo, Y.H.: Hybrid multi-attention transformer for robust video object detection. Eng. Appl. Artif. Intell. 139, 109606 (2025)
Article Google Scholar
Moorthy, S., Joo, Y.H.: Learning dynamic spatial-temporal regularized correlation filter tracking with response deviation suppression via multi-feature fusion. Neural Netw. 167, 360–379 (2023)
Article MATH Google Scholar
Hu, Y., Niu, A., Zhu, Y., Yan, Q., Sun, J., Zhang, Y.: Multiple object tracking based on occlusion-aware embedding consistency learning. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9521–9525 (2024)
Zhang, Y., Xie, H., Jia, Y., Meng, J., Sang, M., Qiu, J., Zhao, S., Yang, Y.: Aipt: adaptive information perception for online multi-object tracking. Knowl.-Based Syst. 285, 111369 (2024)
Article MATH Google Scholar
Cui, Y., Zeng, C., Zhao, X., Yang, Y., Wu, G., Wang, L.: Sportsmot: a large multi-object tracking dataset in multiple sports scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9921–9931 (2023)
Wang, X., Sun, Z., Chehri, A., Jeon, G., Song, Y.: Deep learning and multi-modal fusion for real-time multi-object tracking: algorithms, challenges, datasets, and comparative study. Inf Fusion 105, 102247 (2024)
Article Google Scholar
Nalaie, K., Zheng, R.: Atttrack: online deep attention transfer for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1654–1663 (2023)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Saleh, F., Aliakbarian, S., Rezatofighi, H., Salzmann, M., Gould, S.: Probabilistic tracklet scoring and inpainting for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14329–14339 (2021)
Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 164–173 (2021)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122 (2020)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021)
Article MATH Google Scholar
Liang, C., Zhang, Z., Zhou, X., Li, B., Zhu, S., Hu, W.: Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans. Image Process. 31, 3182–3196 (2022)
Article MATH Google Scholar
Lee, S.-H., Park, D.-H., Bae, S.-H.: Decode-mot: How can we hurdle frames to go beyond tracking-by-detection? IEEE Trans. Image Process. 32, 4378–4392 (2023)
Article Google Scholar
Chan, S., Qiu, C., Wu, D., Hu, J., Heidari, A.A., Chen, H.: Fusion detection and reid embedding with hybrid attention for multi-object tracking. Neurocomputing 575, 127328 (2024)
Article Google Scholar
Chen, S., Yu, E., Li, J., Tao, W.: Delving into the trajectory long-tail distribution for muti-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19341–19351 (2024)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Article MATH Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: multi-object tracking by associating every detection box. In: European Conference on Computer Vision, pp. 1–21 (2022)
Dendorfer, P., Yugay, V., Osep, A., Leal-Taixé, L.: Quo vadis: is trajectory forecasting the key towards long-term multi-object tracking? Adv. Neural. Inf. Process. Syst. 35, 15657–15671 (2022)
Google Scholar
Liu, C., Li, H., Wang, Z.: Fasttrack: a highly efficient and generic gpu-based multi-object tracking method with parallel kalman filter. Int. J. Comput. Vision 132(5), 1463–1483 (2024)
Article MATH Google Scholar
Liang, C., Zhang, Z., Zhou, X., Li, B., Hu, W.: One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1546–1554 (2022)
Yang, P., Luo, X., Sun, J.: A simple but effective method for balancing detection and re-identification in multi-object tracking. IEEE Trans. Multimed. 25, 7456–7468 (2022)
Article MATH Google Scholar
Cao, Z., Li, J., Zhang, D., Zhou, M., Abusorrah, A.: A multi-object tracking algorithm with center-based feature extraction and occlusion handling. IEEE Trans. Intell. Transp. Syst. 24(4), 4464–4473 (2022)
Article MATH Google Scholar
Ma, S., Duan, S., Hou, Z., Yu, W., Pu, L., Zhao, X.: Multi-object tracking algorithm based on interactive attention network and adaptive trajectory reconnection. Expert Syst. Appl. 249, 123581 (2024)
Article Google Scholar
Xu, L., Huang, Y.: Rethinking joint detection and embedding for multiobject tracking in multiscenario. IEEE Trans. Industr. Inf. 20(6), 8079–8088 (2024)
Article MATH Google Scholar
Hu, Y., Niu, A., Sun, J., Zhu, Y., Yan, Q., Dong, W., Woźniak, M., Zhang, Y.: Dynamic center point learning for multiple object tracking under severe occlusions. Knowl.-Based Syst. 300, 112130 (2024)
Article Google Scholar
De Plaen, P.-F., Marinello, N., Proesmans, M., Tuytelaars, T., Van Gool, L.: Contrastive learning for multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6867–6877 (2024)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490 (2020)
Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Fu, Y.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp. 145–161 (2020)
Cai, J., Xu, M., Li, W., Xiong, Y., Xia, W., Tu, Z., Soatto, S.: Memot: Multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
Gao, R., Wang, L.: Memotr: long-term memory-augmented transformer for multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9901–9910 (2023)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)
Yang, F., Odashima, S., Masui, S., Jiang, S.: Hard to track objects with irregular motions and similar appearances? Make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4799–4808 (2023)
Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., Leal-Taixé, L.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vision 129, 845–881 (2021)
Article Google Scholar
Sun, P., Cao, J., Jiang, Y., Yuan, Z., Bai, S., Kitani, K., Luo, P.: Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20993–21002 (2022)
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12352–12361 (2021)
Zhou, X., Yin, T., Koltun, V., Krähenbühl, P.: Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780 (2022)

Download references

Funding

No funding was received for conducting this study.

Author information

Wang Ren, Changhong Yu and Bin Wu contributed equally to this work.

Authors and Affiliations

Zhejiang Gongshang University, Hangzhou, 310018, China
Dongliang Cao, Wang Ren, Changhong Yu & Bin Wu

Authors

Dongliang Cao
View author publications
You can also search for this author inPubMed Google Scholar
Wang Ren
View author publications
You can also search for this author inPubMed Google Scholar
Changhong Yu
View author publications
You can also search for this author inPubMed Google Scholar
Bin Wu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

D. C. and W. R. proposed the idea, designed and performed the simulations, and wrote the paper. C. Y. and B. W. analyzed the data. All authors reviewed the manuscript.

Corresponding author

Correspondence to Changhong Yu.

Ethics declarations

Conflict of interest

There are no Conflict of interest in this study.

Ethical approval

This research paper does not involve any studies with human participants or animals performed by any authors.

Consent to participate

Not applicable.

Consent for publication

All authors of this manuscript consent to its publication.

Additional information

Communicated by Haojie Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, D., Ren, W., Yu, C. et al. IFMOT: interactive perception and feature optimization network for multi-object tracking. Multimedia Systems 31, 136 (2025). https://doi.org/10.1007/s00530-025-01694-9

Download citation

Received: 13 October 2024
Accepted: 23 January 2025
Published: 05 March 2025
DOI: https://doi.org/10.1007/s00530-025-01694-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IFMOT: interactive perception and feature optimization network for multi-object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Temporal-Aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking

Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking

Multi-object tracking with scale-aware transformer and enhanced association strategy

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now