Skip to main content
Log in

VGT-MOT: visibility-guided tracking for online multiple-object tracking

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Multi-object tracking (MOT) is an important task of computer vision which has a wide range of applications. Existing multi-object tracking methods mostly employ the Kalman filter to predict the object location in the next frame. However, if the video is captured by a camera with significant motion variation or contains objects moving at non-constant speed, the Kalman filter may fail. In addition, although object occlusion has been studied extensively in MOT, it has not been well addressed yet. To deal with these problems, a joint detection and tracking method named visibility-guided tracking for MOT (VGT-MOT) is proposed in this paper. Specifically, to cope with the difficulty of accurate object position estimation caused by drastic camera or object motion variation, VGT-MOT utilizes an adjacent-frame object location prediction network with inter-frame attention to predict the target position in the next frame. To handle object occlusion, VGT-MOT employs the object visibility as a dynamic weight to adaptively fuse the motion and appearance similarities and update the object appearance representation. The proposed VGT-MOT has been evaluated on the MOT16, MOT17 and MOT20 datasets. The results show that VGT-MOT compares favorably against state-of-the-art MOT approaches. The source code of the proposed method is available at https://github.com/wang-ironman/VGT-MOT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Janai, J., Guney, F., Behl, A., et al.: Computer vision for autonomous vehicles: problems, datasets and state of the art. Comput. Graph. Vis. 12(1–3), 1–308 (2020)

    Google Scholar 

  2. Sun, P., Kretzschmar, H., Dotiwalla, X., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  3. Oh, S., Hoogs, A., Perera, A., et al.: A large scale benchmark dataset for event recognition in surveillance video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

  4. Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. In: IEEE International Conference on Image Processing (ICIP) (2016)

  5. Tang, S., Andriluka, M., Andres, B., et al.: Multiple people tracking by lifted multicut and person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  6. Xu, J., Cao, Y., Zhang, Z., et al.: Spatial temporal relation networks for multi-object tracking. In: International Conference on Computer Vision (ICCV) (2019)

  7. Porzi, L., Hofinger, M., Ruiz, I., et al.: Learning multi-object tracking and segmentation from automatic annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  8. Meinhardt, T., Kirillov, A., Leal-Taixe, L., et al.: Trackformer: multi-object tracking with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

  9. Zhou, X., Koltun, V., Krahenbuhl, P.: Tracking objects as points. In: European Conference on Computer Vision (ECCV) (2020)

  10. Peng, J., Wang, C., Wan, F., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision (ECCV) (2020)

  11. Lu, L., Rathod, V., Votel, R., et al.: Retinatrack: online single stage joint detection and tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  12. Wang, Z., Zheng, L., Liu, Y., et al.: Towards real-time multi-object tracking. In: European Conference on Computer Vision (ECCV) (2020)

  13. Milan, A., Leal-Taixe, L., Reid, I., et al.: MOT16: a benchmark for multi-object tracking. arXiv:1603.00831

  14. Dendorfer, P., Rezatofighi, H., Milan, A., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 (2020)

  15. Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. In: IEEE International Conference on Image Processing (ICIP) (2016)

  16. Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2017)

  17. Yu, F., Li, W., Li, Q., et al: POI: multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision (ECCV) (2016)

  18. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: IEEE International Conference on Image Processing (ICIP) (2017)

  19. Bochinski, E., Senst, T, Sikora, T.: Extending IoU based multi-object tracking by visual information. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018)

  20. Zhang, Y., Wang, C., Wang, X., et al.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129, 3069–3087 (2021)

    Article  Google Scholar 

  21. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  22. Wu, J., Cao, J., Song, L., et al.: Track to detect and segment: an online multi-object tracker. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

  23. Yu, F., Wang, D., Shelhamer, E., et al.: Deep layer aggregation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  24. Lin, T.-Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV) (2017)

  25. Lin, T.-Y., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV) (2014)

  26. Pang, B., Li, Y., Zhang, Y., et al.: Tubetk: adopting tubes to track multi-object in a one-step training model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  27. Han, S., Huang, P., Wang, H., et al.: MAT: motion-aware multi-object tracking. Neurocomputing 476, 75–86 (2022)

    Article  Google Scholar 

  28. Pang, J., Qiu, L., Li, X., et al.: Quasi-dense similarity learning for multiple object tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

  29. Zeng, F., Dong, B., Wang, T., et al.: MOTR: end-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision (ECCV) (2022)

  30. Zhang, Y., Sheng, H., Wu, Y., et al.: Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J. 7, 7892–7902 (2020)

    Article  Google Scholar 

  31. Xu, Y., Ban, Y., Delorme, G., et al.: Transcenter: transformers with dense queries for multiple-object tracking. arXiv:2103.15145 (2021)

  32. Yu, E., Li, Z., Han, S., et al.: Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Trans. Multimed. (2022)

  33. Shao, S., Zhao, Z., Li, B., et al.: Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123 (2018)

  34. Fabbri, M., Lanzi, F., Calderara, S., et al.: Learning to detect and track visible and occluded body joints in a virtual world. In: European Conference on Computer Vision (ECCV) (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Li, WX., Wang, L. et al. VGT-MOT: visibility-guided tracking for online multiple-object tracking. Machine Vision and Applications 34, 50 (2023). https://doi.org/10.1007/s00138-023-01398-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01398-y

Keywords

Navigation