Abstract
The tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in unsatisfied tracking performance. Thus, we design an end-to-end MOT model based on Graph Convolutional Neural Networks (GCNNs) which fuses four classes of features that characterize objects from their appearances, motions, appearance interactions, and motion interactions. Specifically, a Re-Identification (Re-ID) module is used to extract more discriminative appearance features. The appearance features from object tracklets are then averaged to simplify the proposed tracker. Then, we design two GCNNs to better distinguish objects. One is for extracting interactive appearance features, and the other is for interactive motion features. A fusion module then fuses those features, getting the global feature similarity based on which an association component calculates the MOT matching results. Finally, we semantically visualize relevant structures with the GNNExplainer for insight into the proposed tracker. The evaluation results on MOT16 and MOT17 benchmarks show that our model outperforms the state-of-the-art online tracking methods in terms of Multi-Object Tracking Accuracy and Identification F1 score which is consistent with the results from the GNNExplainer.











Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
All data used during this study are public datasets. The MOT16 dataset and MOT17 dataset are available at https://motchallenge.net. The CrowdHuman dataset is available at http://www.crowdhuman.org/. The Cityperson dataset is available at https://doi.org/10.1109/CVPR.2017.474. The ETHZ dataset is available at https://doi.org/10.1109/CVPR.2008.4587581. The Market1501 dataset is available at https://doi.org/10.1109/ICCV.2015.133. The CUHK03 dataset is available at https://doi.org/10.1109/CVPR.2014.27. The DukeMTMC dataset is available at https://doi.org/10.1007/978-3-319-48881-3_2.
References
Yang K et al (2022) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239
Danelljan M, et al (2017) ECO: Efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, USA, pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733
Li B, et al (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, USA, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935
Yuan D et al (2021) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985. https://doi.org/10.1109/TIP.2020.3037518
Yuan D et al (2023) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 35:3423–3434. https://doi.org/10.1007/s00521-022-07867-1
Ma C, et al (2019) Deep association: end-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on international conference on multimedia retrieval, Ottawa, Canada, pp 253–261. https://doi.org/10.1145/3323873.3325010
Bewley A, et al (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003
Bochinski E, et al (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), Lecce, Italy, pp 1–6. https://doi.org/10.1109/AVSS.2017.8078516
Yang N, et al (2021) Multi-object tracking with tracked object bounding box association. In: 2021 IEEE international conference on multimedia & expo workshops (ICMEW), Shenzhen, China, pp 1–6. https://doi.org/10.1109/ICMEW53276.2021.9455993
Wang G, et al (2021) Track without appearance: learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 9856–9866. https://doi.org/10.1109/ICCV48922.2021.00973
Tsai WJ, et al (2020) Joint detection, re-identification, and LSTM in multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 1–6. https://doi.org/10.1109/ICME46284.2020.9102884
Yu F, et al (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), Amsterdam, Netherlands, pp 36–42. https://doi.org/10.1007/978-3-319-48881-3_3
Wojke N, et al (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Zhang Y et al (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129(11):3069–3087. https://doi.org/10.1007/s11263-021-01513-4
Du Y et al (2023) Strongsort: make deepsort great again. IEEE Trans Multimed 25:8725–8737. https://doi.org/10.1109/TMM.2023.3240881
Wang Y, et al (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, pp 13708–13715. https://doi.org/10.1109/ICRA48506.2021.9561110
Papakis I, et al (2021) A graph convolutional neural network based approach for traffic monitoring using augmented detections with optical flow. In: 2021 IEEE international intelligent transportation systems conference (ITSC), Indianapolis, United States, pp 2980–2986. https://doi.org/10.1109/ITSC48978.2021.9564655
Ying R, et al (2019) Gnnexplainer: generating explanations for graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 9244–9255
Zhang Y, et al (2022) Bytetrack: multi-object tracking by associating every detection box. In: European conference on computer vision 2022 (ECCV2022), Tel Aviv, Israel, pp 1–21. https://doi.org/10.1007/978-3-031-20047-2_1
Felzenszwalb P, et al (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587597
Ren S et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell 39(06):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Yang F, et al (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, USA, pp 2129–2137. https://doi.org/10.1109/CVPR.2016.234
Redmon J, et al (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767
Bochkovskiy A, et al (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934
Kapania S, et al (2020) Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework. In: Proceedings of the 1st ACM workshop on autonomous and intelligent mobile systems, Bangalore, India, pp 1–6. https://doi.org/10.1145/3377283.3377284
Kumar S, et al (2021) Object tracking and counting in a zone using YOLOv4, DeepSORT and TensorFlow. In: 2021 international conference on artificial intelligence and smart systems (ICAIS), Coimbatore, India, pp 1017–1022. https://doi.org/10.1109/ICAIS50930.2021.9395971
Gai Y, et al (2021) Pedestrian target tracking based on DeepSORT With YOLOv5. In: 2021 2nd international conference on computer engineering and intelligent control (ICCEIC), Chongqing, China, pp 1–5. https://doi.org/10.1109/ICCEIC54227.2021.00008
Zhou X, et al (2020) Tracking objects as points. In: European conference on computer vision, Glasgow, UK, pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28
Ge Z, et al (2021) Yolox: exceeding yolo series in 2021. Preprint at arXiv:2107.08430
Rangesh A, et al (2021) TrackMPNN: a message passing graph neural architecture for multi-object tracking. Preprint at arXiv:2101.04206
Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, USA, pp 6246–6256. https://doi.org/10.1109/CVPR42600.2020.00628
Shan C, et al (2020) Tracklets predicting based adaptive graph tracking. Preprint at arxiv:2010.09015
Zhang J (2021) TGCN: time domain graph convolutional network for multiple objects tracking. Preprint at arXiv:2101.01861
Liang T et al (2021) Enhancing the association in multi-object tracking via neighbor graph. Int J Intell Syst 36(11):6713–6730. https://doi.org/10.1002/int.22565
Jiang X, et al (2019) Graph neural based end-to-end data association framework for online multiple-object tracking. Preprint at arXiv:1907.05315
Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision(WACV), Snowmass Village, USA, pp 708–717. https://doi.org/10.1109/WACV45572.2020.9093347
Liu Q, et al (2021) GSM: graph similarity model for multi-object tracking. In: Proceedings of the Twenty-Ninth international joint conference on artificial intelligence (IJCAI’20), Yokohama, Japan, Article 74, pp 530–536
He J, et al (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 5295–5305. https://doi.org/10.1109/CVPR46437.2021.00526
Huang G, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Berman RJJNM (2020) The sinkhorn algorithm, parabolic optimal transport and geometric monge-ampère equations. Numerische Mathematik 145(4), 771–836 https://doi.org/10.1007/s00211-020-01127-x
Milan A, et al (2016) MOT16: a benchmark for multi-object tracking. Preprint at arXiv:1603.00831
Shao S, et al (2018) Crowdhuman: a benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), Honolulu, USA, pp 4457–4465. https://doi.org/10.1109/CVPR.2017.474
Ess A, et al (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581
Zheng L, et al (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 1116–1124. https://doi.org/10.1109/ICCV.2015.133
Li W, et al (2014) Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Columbus, USA, pp 152–159. https://doi.org/10.1109/CVPR.2014.27
Ristani E, et al (2016) Performance measures and a data set for multi-target. In: European conference on computer vision, Amsterdam, Netherlands, pp 17–35. https://doi.org/10.1007/978-3-319-48881-3_2
Luiten J et al (2021) Hota: a higher order metric for evaluating multi-object tracking. Int J Comput Vision 129(2):548–578. https://doi.org/10.1007/s11263-020-01375-2
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103
Peng J, et al (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European conference on computer vision (ECCV), Glasgow, United Kingdom, pp 145–161. https://doi.org/10.1007/978-3-030-58548-8_9
Pang J, et al (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 164-173. https://doi.org/10.1109/CVPR46437.2021.00023
Wu J, et al (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217
Zeng F, et al (2022) Motr: end-to-end multi ple-object tracking with transformer. In: European conference on computer vision (ECCV), Tel Aviv, Israel, pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38
Zhu J, et al (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, pp 379–396. https://doi.org/10.1007/978-3-030-01228-1_23
Feng W, et al (2022) Multi-object tracking with multiple cues and switcher-aware classification. In: 2022 international conference on digital image computing: techniques and applications (DICTA), Sydney, Australia, pp 1–10. https://doi.org/10.1109/DICTA56598.2022.10034575
Yuan D et al (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3266837
Acknowledgements
This work is supported by the National Key R &D Program of China (No. 2021YFF0603904) and the National Natural Science Foundation of China (No. 61771155).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Huang, Q. & Zheng, L. Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer. Neural Comput & Applic 36, 13799–13814 (2024). https://doi.org/10.1007/s00521-024-09773-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09773-0