Skip to main content
Log in

IFMOT: interactive perception and feature optimization network for multi-object tracking

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Multi-object tracking requires accurately identifying and tracking multiple targets over long periods. However, tracking performance is highly susceptible to various factors, such as target deformation, occlusion, etc. Meanwhile, most existing MOT models perform simple aggregation and classification of target features, ignoring the inherent differences and connections between detection and re-identification. This often leads to frequent identity switches. To address the above issues, we propose our tracker IFMOT, a simple and efficient network that combines an interactive perception network with feature optimization. Specifically, we propose an interactive perception network with a multi-head cross-attention mechanism design to alleviate feature conflicts. And then, we introduce a feature optimization module that refines the target representation to improve the extraction capability of feature embeddings. Furthermore, a feature integration similarity matrix is used to comprehensively assess the similarity between objects and handle unreliable similarity matching. Experiments on the MOT16, MOT17, MOT20 and Dancetrack datasets show that the proposed method achieves a higher accuracy while keeping the tracking speed, in contrast to other state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Moorthy, S., Joo, Y.H.: Adaptive spatial-temporal surrounding-aware correlation filter tracking via ensemble learning. Pattern Recogn. 139, 109457 (2023)

    Article  Google Scholar 

  2. Moorthy, S., Choi, J.Y., Joo, Y.H.: Gaussian-response correlation filter for robust visual object tracking. Neurocomputing 411, 78–90 (2020)

    Article  MATH  Google Scholar 

  3. Kuppusami Sakthivel, S.S., Moorthy, S., Arthanari, S., Jeong, J.H., Joo, Y.H.: Learning a context-aware environmental residual correlation filter via deep convolution features for visual object tracking. Mathematics 12(14), 2279 (2024)

    Article  Google Scholar 

  4. Moorthy, S., KS, S.S., Arthanari, S., Jeong, J.H., Joo, Y.H.: Hybrid multi-attention transformer for robust video object detection. Eng. Appl. Artif. Intell. 139, 109606 (2025)

    Article  Google Scholar 

  5. Moorthy, S., Joo, Y.H.: Learning dynamic spatial-temporal regularized correlation filter tracking with response deviation suppression via multi-feature fusion. Neural Netw. 167, 360–379 (2023)

    Article  MATH  Google Scholar 

  6. Hu, Y., Niu, A., Zhu, Y., Yan, Q., Sun, J., Zhang, Y.: Multiple object tracking based on occlusion-aware embedding consistency learning. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9521–9525 (2024)

  7. Zhang, Y., Xie, H., Jia, Y., Meng, J., Sang, M., Qiu, J., Zhao, S., Yang, Y.: Aipt: adaptive information perception for online multi-object tracking. Knowl.-Based Syst. 285, 111369 (2024)

    Article  MATH  Google Scholar 

  8. Cui, Y., Zeng, C., Zhao, X., Yang, Y., Wu, G., Wang, L.: Sportsmot: a large multi-object tracking dataset in multiple sports scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9921–9931 (2023)

  9. Wang, X., Sun, Z., Chehri, A., Jeon, G., Song, Y.: Deep learning and multi-modal fusion for real-time multi-object tracking: algorithms, challenges, datasets, and comparative study. Inf Fusion 105, 102247 (2024)

    Article  Google Scholar 

  10. Nalaie, K., Zheng, R.: Atttrack: online deep attention transfer for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1654–1663 (2023)

  11. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)

  12. Saleh, F., Aliakbarian, S., Rezatofighi, H., Salzmann, M., Gould, S.: Probabilistic tracklet scoring and inpainting for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14329–14339 (2021)

  13. Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 164–173 (2021)

  14. Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122 (2020)

  15. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021)

    Article  MATH  Google Scholar 

  16. Liang, C., Zhang, Z., Zhou, X., Li, B., Zhu, S., Hu, W.: Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans. Image Process. 31, 3182–3196 (2022)

    Article  MATH  Google Scholar 

  17. Lee, S.-H., Park, D.-H., Bae, S.-H.: Decode-mot: How can we hurdle frames to go beyond tracking-by-detection? IEEE Trans. Image Process. 32, 4378–4392 (2023)

    Article  Google Scholar 

  18. Chan, S., Qiu, C., Wu, D., Hu, J., Heidari, A.A., Chen, H.: Fusion detection and reid embedding with hybrid attention for multi-object tracking. Neurocomputing 575, 127328 (2024)

    Article  Google Scholar 

  19. Chen, S., Yu, E., Li, J., Tao, W.: Delving into the trajectory long-tail distribution for muti-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19341–19351 (2024)

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

    Article  MATH  Google Scholar 

  21. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020)

  22. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  23. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)

  24. Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: multi-object tracking by associating every detection box. In: European Conference on Computer Vision, pp. 1–21 (2022)

  25. Dendorfer, P., Yugay, V., Osep, A., Leal-Taixé, L.: Quo vadis: is trajectory forecasting the key towards long-term multi-object tracking? Adv. Neural. Inf. Process. Syst. 35, 15657–15671 (2022)

    Google Scholar 

  26. Liu, C., Li, H., Wang, Z.: Fasttrack: a highly efficient and generic gpu-based multi-object tracking method with parallel kalman filter. Int. J. Comput. Vision 132(5), 1463–1483 (2024)

    Article  MATH  Google Scholar 

  27. Liang, C., Zhang, Z., Zhou, X., Li, B., Hu, W.: One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1546–1554 (2022)

  28. Yang, P., Luo, X., Sun, J.: A simple but effective method for balancing detection and re-identification in multi-object tracking. IEEE Trans. Multimed. 25, 7456–7468 (2022)

    Article  MATH  Google Scholar 

  29. Cao, Z., Li, J., Zhang, D., Zhou, M., Abusorrah, A.: A multi-object tracking algorithm with center-based feature extraction and occlusion handling. IEEE Trans. Intell. Transp. Syst. 24(4), 4464–4473 (2022)

    Article  MATH  Google Scholar 

  30. Ma, S., Duan, S., Hou, Z., Yu, W., Pu, L., Zhao, X.: Multi-object tracking algorithm based on interactive attention network and adaptive trajectory reconnection. Expert Syst. Appl. 249, 123581 (2024)

    Article  Google Scholar 

  31. Xu, L., Huang, Y.: Rethinking joint detection and embedding for multiobject tracking in multiscenario. IEEE Trans. Industr. Inf. 20(6), 8079–8088 (2024)

    Article  MATH  Google Scholar 

  32. Hu, Y., Niu, A., Sun, J., Zhu, Y., Yan, Q., Dong, W., Woźniak, M., Zhang, Y.: Dynamic center point learning for multiple object tracking under severe occlusions. Knowl.-Based Syst. 300, 112130 (2024)

    Article  Google Scholar 

  33. De Plaen, P.-F., Marinello, N., Proesmans, M., Tuytelaars, T., Van Gool, L.: Contrastive learning for multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6867–6877 (2024)

  34. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490 (2020)

  35. Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Fu, Y.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp. 145–161 (2020)

  36. Cai, J., Xu, M., Li, W., Xiong, Y., Xia, W., Tu, Z., Soatto, S.: Memot: Multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)

  37. Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)

  38. Gao, R., Wang, L.: Memotr: long-term memory-augmented transformer for multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9901–9910 (2023)

  39. Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)

  40. Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)

  41. Yang, F., Odashima, S., Masui, S., Jiang, S.: Hard to track objects with irregular motions and similar appearances? Make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4799–4808 (2023)

  42. Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., Leal-Taixé, L.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vision 129, 845–881 (2021)

    Article  Google Scholar 

  43. Sun, P., Cao, J., Jiang, Y., Yuan, Z., Bai, S., Kitani, K., Luo, P.: Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20993–21002 (2022)

  44. Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12352–12361 (2021)

  45. Zhou, X., Yin, T., Koltun, V., Krähenbühl, P.: Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780 (2022)

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

D. C. and W. R. proposed the idea, designed and performed the simulations, and wrote the paper. C. Y. and B. W. analyzed the data. All authors reviewed the manuscript.

Corresponding author

Correspondence to Changhong Yu.

Ethics declarations

Conflict of interest

There are no Conflict of interest in this study.

Ethical approval

This research paper does not involve any studies with human participants or animals performed by any authors.

Consent to participate

Not applicable.

Consent for publication

All authors of this manuscript consent to its publication.

Additional information

Communicated by Haojie Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, D., Ren, W., Yu, C. et al. IFMOT: interactive perception and feature optimization network for multi-object tracking. Multimedia Systems 31, 136 (2025). https://doi.org/10.1007/s00530-025-01694-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-025-01694-9

Keywords