Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Zhang, Yubo; Huang, Qingming; Zheng, Liying

doi:10.1007/s00521-024-09773-0

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Original Article
Published: 28 April 2024

Volume 36, pages 13799–13814, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yubo Zhang¹,
Qingming Huang^1,2 &
Liying Zheng¹

323 Accesses
Explore all metrics

Abstract

The tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in unsatisfied tracking performance. Thus, we design an end-to-end MOT model based on Graph Convolutional Neural Networks (GCNNs) which fuses four classes of features that characterize objects from their appearances, motions, appearance interactions, and motion interactions. Specifically, a Re-Identification (Re-ID) module is used to extract more discriminative appearance features. The appearance features from object tracklets are then averaged to simplify the proposed tracker. Then, we design two GCNNs to better distinguish objects. One is for extracting interactive appearance features, and the other is for interactive motion features. A fusion module then fuses those features, getting the global feature similarity based on which an association component calculates the MOT matching results. Finally, we semantically visualize relevant structures with the GNNExplainer for insight into the proposed tracker. The evaluation results on MOT16 and MOT17 benchmarks show that our model outperforms the state-of-the-art online tracking methods in terms of Multi-Object Tracking Accuracy and Identification F1 score which is consistent with the results from the GNNExplainer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-object tracking based on graph neural networks

Article 01 February 2025

Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

Article 19 April 2021

Online Multiple Person Tracking Using Fully-Convolutional Neural Networks and Motion Invariance Constraints

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Data availability

All data used during this study are public datasets. The MOT16 dataset and MOT17 dataset are available at https://motchallenge.net. The CrowdHuman dataset is available at http://www.crowdhuman.org/. The Cityperson dataset is available at https://doi.org/10.1109/CVPR.2017.474. The ETHZ dataset is available at https://doi.org/10.1109/CVPR.2008.4587581. The Market1501 dataset is available at https://doi.org/10.1109/ICCV.2015.133. The CUHK03 dataset is available at https://doi.org/10.1109/CVPR.2014.27. The DukeMTMC dataset is available at https://doi.org/10.1007/978-3-319-48881-3_2.

References

Yang K et al (2022) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239
Article Google Scholar
Danelljan M, et al (2017) ECO: Efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, USA, pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733
Li B, et al (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, USA, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935
Yuan D et al (2021) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985. https://doi.org/10.1109/TIP.2020.3037518
Article Google Scholar
Yuan D et al (2023) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 35:3423–3434. https://doi.org/10.1007/s00521-022-07867-1
Article Google Scholar
Ma C, et al (2019) Deep association: end-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on international conference on multimedia retrieval, Ottawa, Canada, pp 253–261. https://doi.org/10.1145/3323873.3325010
Bewley A, et al (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003
Bochinski E, et al (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), Lecce, Italy, pp 1–6. https://doi.org/10.1109/AVSS.2017.8078516
Yang N, et al (2021) Multi-object tracking with tracked object bounding box association. In: 2021 IEEE international conference on multimedia & expo workshops (ICMEW), Shenzhen, China, pp 1–6. https://doi.org/10.1109/ICMEW53276.2021.9455993
Wang G, et al (2021) Track without appearance: learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 9856–9866. https://doi.org/10.1109/ICCV48922.2021.00973
Tsai WJ, et al (2020) Joint detection, re-identification, and LSTM in multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 1–6. https://doi.org/10.1109/ICME46284.2020.9102884
Yu F, et al (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), Amsterdam, Netherlands, pp 36–42. https://doi.org/10.1007/978-3-319-48881-3_3
Wojke N, et al (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Zhang Y et al (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129(11):3069–3087. https://doi.org/10.1007/s11263-021-01513-4
Article Google Scholar
Du Y et al (2023) Strongsort: make deepsort great again. IEEE Trans Multimed 25:8725–8737. https://doi.org/10.1109/TMM.2023.3240881
Article Google Scholar
Wang Y, et al (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, pp 13708–13715. https://doi.org/10.1109/ICRA48506.2021.9561110
Papakis I, et al (2021) A graph convolutional neural network based approach for traffic monitoring using augmented detections with optical flow. In: 2021 IEEE international intelligent transportation systems conference (ITSC), Indianapolis, United States, pp 2980–2986. https://doi.org/10.1109/ITSC48978.2021.9564655
Ying R, et al (2019) Gnnexplainer: generating explanations for graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 9244–9255
Zhang Y, et al (2022) Bytetrack: multi-object tracking by associating every detection box. In: European conference on computer vision 2022 (ECCV2022), Tel Aviv, Israel, pp 1–21. https://doi.org/10.1007/978-3-031-20047-2_1
Felzenszwalb P, et al (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587597
Ren S et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell 39(06):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Yang F, et al (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, USA, pp 2129–2137. https://doi.org/10.1109/CVPR.2016.234
Redmon J, et al (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767
Bochkovskiy A, et al (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934
Kapania S, et al (2020) Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework. In: Proceedings of the 1st ACM workshop on autonomous and intelligent mobile systems, Bangalore, India, pp 1–6. https://doi.org/10.1145/3377283.3377284
Kumar S, et al (2021) Object tracking and counting in a zone using YOLOv4, DeepSORT and TensorFlow. In: 2021 international conference on artificial intelligence and smart systems (ICAIS), Coimbatore, India, pp 1017–1022. https://doi.org/10.1109/ICAIS50930.2021.9395971
Gai Y, et al (2021) Pedestrian target tracking based on DeepSORT With YOLOv5. In: 2021 2nd international conference on computer engineering and intelligent control (ICCEIC), Chongqing, China, pp 1–5. https://doi.org/10.1109/ICCEIC54227.2021.00008
Zhou X, et al (2020) Tracking objects as points. In: European conference on computer vision, Glasgow, UK, pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28
Ge Z, et al (2021) Yolox: exceeding yolo series in 2021. Preprint at arXiv:2107.08430
Rangesh A, et al (2021) TrackMPNN: a message passing graph neural architecture for multi-object tracking. Preprint at arXiv:2101.04206
Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, USA, pp 6246–6256. https://doi.org/10.1109/CVPR42600.2020.00628
Shan C, et al (2020) Tracklets predicting based adaptive graph tracking. Preprint at arxiv:2010.09015
Zhang J (2021) TGCN: time domain graph convolutional network for multiple objects tracking. Preprint at arXiv:2101.01861
Liang T et al (2021) Enhancing the association in multi-object tracking via neighbor graph. Int J Intell Syst 36(11):6713–6730. https://doi.org/10.1002/int.22565
Article Google Scholar
Jiang X, et al (2019) Graph neural based end-to-end data association framework for online multiple-object tracking. Preprint at arXiv:1907.05315
Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision(WACV), Snowmass Village, USA, pp 708–717. https://doi.org/10.1109/WACV45572.2020.9093347
Liu Q, et al (2021) GSM: graph similarity model for multi-object tracking. In: Proceedings of the Twenty-Ninth international joint conference on artificial intelligence (IJCAI’20), Yokohama, Japan, Article 74, pp 530–536
He J, et al (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 5295–5305. https://doi.org/10.1109/CVPR46437.2021.00526
Huang G, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Berman RJJNM (2020) The sinkhorn algorithm, parabolic optimal transport and geometric monge-ampère equations. Numerische Mathematik 145(4), 771–836 https://doi.org/10.1007/s00211-020-01127-x
Milan A, et al (2016) MOT16: a benchmark for multi-object tracking. Preprint at arXiv:1603.00831
Shao S, et al (2018) Crowdhuman: a benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), Honolulu, USA, pp 4457–4465. https://doi.org/10.1109/CVPR.2017.474
Ess A, et al (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581
Zheng L, et al (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 1116–1124. https://doi.org/10.1109/ICCV.2015.133
Li W, et al (2014) Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Columbus, USA, pp 152–159. https://doi.org/10.1109/CVPR.2014.27
Ristani E, et al (2016) Performance measures and a data set for multi-target. In: European conference on computer vision, Amsterdam, Netherlands, pp 17–35. https://doi.org/10.1007/978-3-319-48881-3_2
Luiten J et al (2021) Hota: a higher order metric for evaluating multi-object tracking. Int J Comput Vision 129(2):548–578. https://doi.org/10.1007/s11263-020-01375-2
Article Google Scholar
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103
Peng J, et al (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European conference on computer vision (ECCV), Glasgow, United Kingdom, pp 145–161. https://doi.org/10.1007/978-3-030-58548-8_9
Pang J, et al (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 164-173. https://doi.org/10.1109/CVPR46437.2021.00023
Wu J, et al (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217
Zeng F, et al (2022) Motr: end-to-end multi ple-object tracking with transformer. In: European conference on computer vision (ECCV), Tel Aviv, Israel, pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38
Zhu J, et al (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, pp 379–396. https://doi.org/10.1007/978-3-030-01228-1_23
Feng W, et al (2022) Multi-object tracking with multiple cues and switcher-aware classification. In: 2022 international conference on digital image computing: techniques and applications (DICTA), Sydney, Australia, pp 1–10. https://doi.org/10.1109/DICTA56598.2022.10034575
Yuan D et al (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3266837
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Key R &D Program of China (No. 2021YFF0603904) and the National Natural Science Foundation of China (No. 61771155).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Engineering University, 145 Nantong Street, Harbin, 150001, Heilongjiang, China
Yubo Zhang, Qingming Huang & Liying Zheng
School of Computer Science and Technology, University of Chinese Academy of Sciences, Huairou District, Beijing, 100019, China
Qingming Huang

Authors

Yubo Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Qingming Huang
View author publications
You can also search for this author inPubMed Google Scholar
Liying Zheng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Liying Zheng.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Huang, Q. & Zheng, L. Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer. Neural Comput & Applic 36, 13799–13814 (2024). https://doi.org/10.1007/s00521-024-09773-0

Download citation

Received: 05 April 2023
Accepted: 25 March 2024
Published: 28 April 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00521-024-09773-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-object tracking based on graph neural networks

Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

Online Multiple Person Tracking Using Fully-Convolutional Neural Networks and Motion Invariance Constraints

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now