Skip to main content

Advertisement

Log in

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in unsatisfied tracking performance. Thus, we design an end-to-end MOT model based on Graph Convolutional Neural Networks (GCNNs) which fuses four classes of features that characterize objects from their appearances, motions, appearance interactions, and motion interactions. Specifically, a Re-Identification (Re-ID) module is used to extract more discriminative appearance features. The appearance features from object tracklets are then averaged to simplify the proposed tracker. Then, we design two GCNNs to better distinguish objects. One is for extracting interactive appearance features, and the other is for interactive motion features. A fusion module then fuses those features, getting the global feature similarity based on which an association component calculates the MOT matching results. Finally, we semantically visualize relevant structures with the GNNExplainer for insight into the proposed tracker. The evaluation results on MOT16 and MOT17 benchmarks show that our model outperforms the state-of-the-art online tracking methods in terms of Multi-Object Tracking Accuracy and Identification F1 score which is consistent with the results from the GNNExplainer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

All data used during this study are public datasets. The MOT16 dataset and MOT17 dataset are available at https://motchallenge.net. The CrowdHuman dataset is available at http://www.crowdhuman.org/. The Cityperson dataset is available at https://doi.org/10.1109/CVPR.2017.474. The ETHZ dataset is available at https://doi.org/10.1109/CVPR.2008.4587581. The Market1501 dataset is available at https://doi.org/10.1109/ICCV.2015.133. The CUHK03 dataset is available at https://doi.org/10.1109/CVPR.2014.27. The DukeMTMC dataset is available at https://doi.org/10.1007/978-3-319-48881-3_2.

References

  1. Yang K et al (2022) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239

    Article  Google Scholar 

  2. Danelljan M, et al (2017) ECO: Efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, USA, pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733

  3. Li B, et al (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, USA, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935

  4. Yuan D et al (2021) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985. https://doi.org/10.1109/TIP.2020.3037518

    Article  Google Scholar 

  5. Yuan D et al (2023) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 35:3423–3434. https://doi.org/10.1007/s00521-022-07867-1

    Article  Google Scholar 

  6. Ma C, et al (2019) Deep association: end-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on international conference on multimedia retrieval, Ottawa, Canada, pp 253–261. https://doi.org/10.1145/3323873.3325010

  7. Bewley A, et al (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003

  8. Bochinski E, et al (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), Lecce, Italy, pp 1–6. https://doi.org/10.1109/AVSS.2017.8078516

  9. Yang N, et al (2021) Multi-object tracking with tracked object bounding box association. In: 2021 IEEE international conference on multimedia & expo workshops (ICMEW), Shenzhen, China, pp 1–6. https://doi.org/10.1109/ICMEW53276.2021.9455993

  10. Wang G, et al (2021) Track without appearance: learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 9856–9866. https://doi.org/10.1109/ICCV48922.2021.00973

  11. Tsai WJ, et al (2020) Joint detection, re-identification, and LSTM in multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pp 1–6. https://doi.org/10.1109/ICME46284.2020.9102884

  12. Yu F, et al (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF international conference on computer vision(ICCV), Amsterdam, Netherlands, pp 36–42. https://doi.org/10.1007/978-3-319-48881-3_3

  13. Wojke N, et al (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962

  14. Zhang Y et al (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129(11):3069–3087. https://doi.org/10.1007/s11263-021-01513-4

    Article  Google Scholar 

  15. Du Y et al (2023) Strongsort: make deepsort great again. IEEE Trans Multimed 25:8725–8737. https://doi.org/10.1109/TMM.2023.3240881

    Article  Google Scholar 

  16. Wang Y, et al (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, pp 13708–13715. https://doi.org/10.1109/ICRA48506.2021.9561110

  17. Papakis I, et al (2021) A graph convolutional neural network based approach for traffic monitoring using augmented detections with optical flow. In: 2021 IEEE international intelligent transportation systems conference (ITSC), Indianapolis, United States, pp 2980–2986. https://doi.org/10.1109/ITSC48978.2021.9564655

  18. Ying R, et al (2019) Gnnexplainer: generating explanations for graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 9244–9255

  19. Zhang Y, et al (2022) Bytetrack: multi-object tracking by associating every detection box. In: European conference on computer vision 2022 (ECCV2022), Tel Aviv, Israel, pp 1–21. https://doi.org/10.1007/978-3-031-20047-2_1

  20. Felzenszwalb P, et al (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587597

  21. Ren S et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell 39(06):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  22. Yang F, et al (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, USA, pp 2129–2137. https://doi.org/10.1109/CVPR.2016.234

  23. Redmon J, et al (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767

  24. Bochkovskiy A, et al (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934

  25. Kapania S, et al (2020) Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework. In: Proceedings of the 1st ACM workshop on autonomous and intelligent mobile systems, Bangalore, India, pp 1–6. https://doi.org/10.1145/3377283.3377284

  26. Kumar S, et al (2021) Object tracking and counting in a zone using YOLOv4, DeepSORT and TensorFlow. In: 2021 international conference on artificial intelligence and smart systems (ICAIS), Coimbatore, India, pp 1017–1022. https://doi.org/10.1109/ICAIS50930.2021.9395971

  27. Gai Y, et al (2021) Pedestrian target tracking based on DeepSORT With YOLOv5. In: 2021 2nd international conference on computer engineering and intelligent control (ICCEIC), Chongqing, China, pp 1–5. https://doi.org/10.1109/ICCEIC54227.2021.00008

  28. Zhou X, et al (2020) Tracking objects as points. In: European conference on computer vision, Glasgow, UK, pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28

  29. Ge Z, et al (2021) Yolox: exceeding yolo series in 2021. Preprint at arXiv:2107.08430

  30. Rangesh A, et al (2021) TrackMPNN: a message passing graph neural architecture for multi-object tracking. Preprint at arXiv:2101.04206

  31. Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, USA, pp 6246–6256. https://doi.org/10.1109/CVPR42600.2020.00628

  32. Shan C, et al (2020) Tracklets predicting based adaptive graph tracking. Preprint at arxiv:2010.09015

  33. Zhang J (2021) TGCN: time domain graph convolutional network for multiple objects tracking. Preprint at arXiv:2101.01861

  34. Liang T et al (2021) Enhancing the association in multi-object tracking via neighbor graph. Int J Intell Syst 36(11):6713–6730. https://doi.org/10.1002/int.22565

    Article  Google Scholar 

  35. Jiang X, et al (2019) Graph neural based end-to-end data association framework for online multiple-object tracking. Preprint at arXiv:1907.05315

  36. Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision(WACV), Snowmass Village, USA, pp 708–717. https://doi.org/10.1109/WACV45572.2020.9093347

  37. Liu Q, et al (2021) GSM: graph similarity model for multi-object tracking. In: Proceedings of the Twenty-Ninth international joint conference on artificial intelligence (IJCAI’20), Yokohama, Japan, Article 74, pp 530–536

  38. He J, et al (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 5295–5305. https://doi.org/10.1109/CVPR46437.2021.00526

  39. Huang G, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243

  40. Berman RJJNM (2020) The sinkhorn algorithm, parabolic optimal transport and geometric monge-ampère equations. Numerische Mathematik 145(4), 771–836 https://doi.org/10.1007/s00211-020-01127-x

  41. Milan A, et al (2016) MOT16: a benchmark for multi-object tracking. Preprint at arXiv:1603.00831

  42. Shao S, et al (2018) Crowdhuman: a benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123

  43. Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), Honolulu, USA, pp 4457–4465. https://doi.org/10.1109/CVPR.2017.474

  44. Ess A, et al (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581

  45. Zheng L, et al (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 1116–1124. https://doi.org/10.1109/ICCV.2015.133

  46. Li W, et al (2014) Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Columbus, USA, pp 152–159. https://doi.org/10.1109/CVPR.2014.27

  47. Ristani E, et al (2016) Performance measures and a data set for multi-target. In: European conference on computer vision, Amsterdam, Netherlands, pp 17–35. https://doi.org/10.1007/978-3-319-48881-3_2

  48. Luiten J et al (2021) Hota: a higher order metric for evaluating multi-object tracking. Int J Comput Vision 129(2):548–578. https://doi.org/10.1007/s11263-020-01375-2

    Article  Google Scholar 

  49. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103

  50. Peng J, et al (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European conference on computer vision (ECCV), Glasgow, United Kingdom, pp 145–161. https://doi.org/10.1007/978-3-030-58548-8_9

  51. Pang J, et al (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 164-173. https://doi.org/10.1109/CVPR46437.2021.00023

  52. Wu J, et al (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, USA, pp 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217

  53. Zeng F, et al (2022) Motr: end-to-end multi ple-object tracking with transformer. In: European conference on computer vision (ECCV), Tel Aviv, Israel, pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38

  54. Zhu J, et al (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, pp 379–396. https://doi.org/10.1007/978-3-030-01228-1_23

  55. Feng W, et al (2022) Multi-object tracking with multiple cues and switcher-aware classification. In: 2022 international conference on digital image computing: techniques and applications (DICTA), Sydney, Australia, pp 1–10. https://doi.org/10.1109/DICTA56598.2022.10034575

  56. Yuan D et al (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3266837

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key R &D Program of China (No. 2021YFF0603904) and the National Natural Science Foundation of China (No. 61771155).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liying Zheng.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Huang, Q. & Zheng, L. Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer. Neural Comput & Applic 36, 13799–13814 (2024). https://doi.org/10.1007/s00521-024-09773-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09773-0

Keywords