Multi-object tracking based on graph neural networks

Zhang, Yubo; Zheng, Liying; Huang, Qingming

doi:10.1007/s00530-025-01679-8

Multi-object tracking based on graph neural networks

Regular Paper
Published: 01 February 2025

Volume 31, article number 89, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yubo Zhang¹,
Liying Zheng¹ &
Qingming Huang^1,2

210 Accesses
Explore all metrics

Abstract

Multi-Object Tracking (MOT), an essential task in computer vision, underperforms when existing occlusions or motion blurs, which will cause changes in the object’s appearances. We develop three modules based on Graph Neural Networks (GNNs) to handle such appearance changes. The appearance enhancement module boosts appearance features by applying self-attention and Graph Convolutional Neural Network (GCNN) to the local features. The temporal feature updating module automatically updates a tracklet appearance template using GCNNs with different Laplacian operations. The spatial feature updating module encodes interactive spatial features by combining a graph attention network and a GCNN. After processing input video frames with these three modules, our tracker stores all extracted features in a memory bank and then forwards them to a matching algorithm to complete tracking. Using popular benchmark datasets MOT16, MOT17, and MOT20, we show that introducing GNNs to MOT benefits tracking, and the proposed tracker surpasses the state-of-the-art trackers, including StrongSORT, ByteTrack, and BoT-SORT. Specifically, we can achieve 81.1% (77.9%) in MOTA, 80.3% (77.3%) in IDF1, and 65.1% (63.2%) in HOTA on the challenging MOT17 (or the newest MOT20) datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Article 28 April 2024

SCGTracker: object feature embedding enhancement based on graph attention networks for multi-object tracking

Article Open access 11 May 2024

Object Tracking Using Deep Convolutional Neural Networks and Visual Appearance Models

Data availability

Data will be made available on request.

References

Wang, G., et al.: Track without appearance: learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada, pp. 9856–9866 (2021). https://doi.org/10.1109/ICCV48922.2021.00973
Li, J., Gao, X., Jiang, T.: Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Snowmass, USA, pp. 708–717 (2020). https://doi.org/10.1109/WACV45572.2020.9093347
Hu, Y., Gao, J., Xu, C.: Learning scene-aware spatio-temporal gnns for few-shot early action prediction. IEEE Trans. Multim. 25, 2061–2073 (2023). https://doi.org/10.1109/TMM.2022.3142413
Article Google Scholar
Hu, J., Hooi, B., He, B.: Efficient heterogeneous graph learning via random projection. IEEE Trans. Knowl. Data Eng. 36(12), 8093–8107 (2024). https://doi.org/10.1109/TKDE.2024.3434956
Article Google Scholar
Hu, Y., Gao, J., Xu, C.: Learning dual-pooling graph neural networks for few-shot video classification. IEEE Trans. Multim. 23, 4285–4296 (2021). https://doi.org/10.1109/TMM.2020.3039329
Article Google Scholar
Zhang, J.: TGCN: time domain graph convolutional network for multiple objects tracking. arXiv:2101.01861 (2021)
Yu, F., et al.: Poi: multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands, vol. 9914, pp. 36–42 (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Zhang, Y., et al.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021). https://doi.org/10.1007/s11263-021-01513-4
Article Google Scholar
Du, Y., et al.: Strongsort: make deepsort great again. IEEE Trans. Multim. 25, 8725–8737 (2023). https://doi.org/10.1109/TMM.2023.3240881
Article Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP). Beijing, China, pp. 3645–3649 (2017). https://doi.org/10.1109/ICIP.2017.8296962
Papakis, I., Sarkar, A., Karpatne, A.: A graph convolutional neural network based approach for traffic monitoring using augmented detections with optical flow. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Indianapolis, USA, pp. 2980–2986 (2021). https://doi.org/10.1109/ITSC48978.2021.9564655
Lan, L., et al.: Interacting tracklets for multi-object tracking. IEEE Trans. Image Process. 27(9), 4585–4597 (2018). https://doi.org/10.1109/TIP.2018.2843129
Article MathSciNet Google Scholar
He, J., et al.: Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA, pp. 5295–5305 (2021). https://doi.org/10.1109/CVPR46437.2021.00526
Aharon, N., Orfaig, R., Bobrovsky, B.-Z.: BoT-SORT: robust associations multi-pedestrian tracking. arXiv:2206.14651 (2022)
Liang, T., et al.: Enhancing the association in multi-object tracking via neighbor graph. Int. J. Intell. Syst. 36(11), 6713–6730 (2021). https://doi.org/10.1002/int.22565
Article Google Scholar
Ma, C., et al.: Deep association: end-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR). Ottawa ON, Canada, pp. 253–261 (2019). https://doi.org/10.1145/3323873.3325010
Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: European Conference on Computer Vision (ECCV). Tel Aviv, Israel. p. 1–21 (2022). https://doi.org/10.1007/978-3-031-20047-2_1
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Lecce, Italy, pp. 1–6 (2017). https://doi.org/10.1109/AVSS.2017.8078516
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA, pp. 6247–6257 (2020). https://doi.org/10.1109/CVPR42600.2020.00628
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), pp. 941–951 (2019). https://doi.org/10.1109/ICCV.2019.00103
Cui, Y., et al.: Tf-blender: temporal feature blender for video object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8118–8127 (2021). https://doi.org/10.1109/ICCV48922.2021.00803
Ren, H., et al.: Focus on details: online multi-object tracking with diverse fine-grained representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada, pp. 11289–11298 (2023). https://doi.org/10.1109/CVPR52729.2023.01086.
Du, Y., et al.: GIAOTracker: a comprehensive framework for mcmot with global information and optimizing strategies in visdrone 2021. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, Canada, pp. 2809–2819 (2021). https://doi.org/10.1109/ICCVW54120.2021.00315
Wang, Y.-H., et al.: Smiletrack: similarity learning for occlusion-aware multiple object tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada (2024), vol. 38, no. 6, pp. 5740–5748. https://doi.org/10.1609/aaai.v38i6.28386
Liu, D., et al.: Sg-net: spatial granularity network for one-stage video instance segmentation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9811–9820 (2021). https://doi.org/10.1109/CVPR46437.2021.00969
Liu, Q., et al.: GSM: graph similarity model for multi-object tracking. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI). Yokohama, Japan, pp. 530–536 (2021)
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi’an, China, pp. 13708–13715 (2021). https://doi.org/10.1109/ICRA48506.2021.9561110
Chu, P., et al., Transmot: spatial-temporal graph transformer for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA, pp. 4859–4869 (2023). https://doi.org/10.1109/WACV56688.2023.00485
Hyun, J., et al.: Detection recovery in online multi-object tracking with sparse graph tracker. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA, pp. 4839–4848 (2023). https://doi.org/10.1109/WACV56688.2023.00483
Wu, M., et al.: Multiview vehicle tracking by graph matching model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 29–36 (2019)
Wang, Z., et al.: Towards real-time multi-object tracking. In: European Conference on Computer Vision (ECCV). Glasgow, UK, vol. 12356, pp. 107–122 (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Cai, J., et al.: Memot: multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA, pp. 8080–8090 (2022). https://doi.org/10.1109/CVPR52688.2022.00792
Wang, H., et al.: Sture: spatial–temporal mutual representation learning for robust data association in online multi-object tracking. Comput. Vis. Image Underst. 220, 103433 (2022). https://doi.org/10.1016/j.cviu.2022.103433
Article Google Scholar
Zhu, T., et al.: Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 12783–12797 (2023). https://doi.org/10.1109/TPAMI.2022.3213073
Article Google Scholar
Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, USA, vol. 32, no. 1 (2018). https://doi.org/10.1609/aaai.v32i1.11604
Feng, W., et al.: Online multiple-pedestrian tracking with detection-pair-based graph convolutional networks. IEEE Internet Things J. 9(24), 25086–25099 (2022). https://doi.org/10.1109/JIOT.2022.3195359
Article Google Scholar
He, L., et al.: Fastreid: a pytorch toolbox for general instance re-identification. In: Proceedings of the 31st ACM International Conference on Multimedia (MM). Ottawa, Canada, pp. 9664–9667 (2023). https://doi.org/10.1145/3581783.3613460
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. (2017)
Tan, H., et al.: MHSA-Net: multihead self-attention network for occluded person re-identification. IEEE Trans. Neural Netw. Learn. Syst. 34(11), 8210–8224 (2023). https://doi.org/10.1109/TNNLS.2022.3144163
Article Google Scholar
Veličković, P., et al.: Graph attention networks. arXiv:1710.10903 (2017)
Milan, A., et al.: MOT16: a benchmark for multi-object tracking. arXiv:1603.00831 (2016)
Dendorfer, P., et al.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 129, 845–881 (2021). https://doi.org/10.1007/s11263-020-01393-0
Article Google Scholar
Dendorfer, P., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 (2020)
Luiten, J., et al.: Hota: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129(2), 548–578 (2021). https://doi.org/10.1007/s11263-020-01375-2
Article Google Scholar
Weihong, R., et al.: Joint counting, detection and re-identification for multi-object tracking. arXiv:2212.05861 (2024)
You, S., et al.: UTM: a unified multiple object tracking model with identity-aware feature enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada, pp. 21876–21886 (2023). https://doi.org/10.1109/CVPR52729.2023.02095
Kong, J., et al.: MOTFR: multiple object tracking based on feature recoding. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7746–7757 (2022). https://doi.org/10.1109/TCSVT.2022.3182709
Article Google Scholar
Wu, J., et al.: Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA, pp. 12347–12356 (2021). https://doi.org/10.1109/CVPR46437.2021.01217
Ren, S., et al.: Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Zeng, F., et al.: Motr: end-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision (ECCV). Tel Aviv, Israel, vol. 13687, pp. 659–675 (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision (ECCV). Glasgow, UK, vol. 12349, pp. 474–490 (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Liang, J., et al.: Clusterfomer: clustering as a universal visual learner. Adv. Neural. Inf. Process. Syst. 36, 64029–64042 (2023)
Google Scholar
Wang, T., et al.: M²pt: multimodal prompt tuning for zero-shot instruction learning. Miami, Florida, USA, pp. 3723–3740 (2024). https://doi.org/10.18653/v1/2024.emnlp-main.218
Han, C., et al.: Prototypical transformer as unified motion learners. In: Forty-first international conference on machine learning (ICML), no. 69. (2024). https://openreview.net/forum?id=JOrLz5d7OW

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61771155).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
Yubo Zhang, Liying Zheng & Qingming Huang
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100019, China
Qingming Huang

Authors

Yubo Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Liying Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Qingming Huang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yubo Zhang: methodology; software; visualization; writing—original draft. Liying Zheng: conceptualization; funding acquisition; resources; writing—review and editing; project administration; supervision. Qingming Huang: supervision; writing—review and editing.

Corresponding author

Correspondence to Liying Zheng.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Communicated by Junyu Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zheng, L. & Huang, Q. Multi-object tracking based on graph neural networks. Multimedia Systems 31, 89 (2025). https://doi.org/10.1007/s00530-025-01679-8

Download citation

Received: 01 September 2024
Accepted: 14 January 2025
Published: 01 February 2025
DOI: https://doi.org/10.1007/s00530-025-01679-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-object tracking based on graph neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

SCGTracker: object feature embedding enhancement based on graph attention networks for multi-object tracking

Object Tracking Using Deep Convolutional Neural Networks and Visual Appearance Models

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now