Abstract
Multi-object tracking (MOT) is an important task of computer vision which has a wide range of applications. Existing multi-object tracking methods mostly employ the Kalman filter to predict the object location in the next frame. However, if the video is captured by a camera with significant motion variation or contains objects moving at non-constant speed, the Kalman filter may fail. In addition, although object occlusion has been studied extensively in MOT, it has not been well addressed yet. To deal with these problems, a joint detection and tracking method named visibility-guided tracking for MOT (VGT-MOT) is proposed in this paper. Specifically, to cope with the difficulty of accurate object position estimation caused by drastic camera or object motion variation, VGT-MOT utilizes an adjacent-frame object location prediction network with inter-frame attention to predict the target position in the next frame. To handle object occlusion, VGT-MOT employs the object visibility as a dynamic weight to adaptively fuse the motion and appearance similarities and update the object appearance representation. The proposed VGT-MOT has been evaluated on the MOT16, MOT17 and MOT20 datasets. The results show that VGT-MOT compares favorably against state-of-the-art MOT approaches. The source code of the proposed method is available at https://github.com/wang-ironman/VGT-MOT.
Similar content being viewed by others
References
Janai, J., Guney, F., Behl, A., et al.: Computer vision for autonomous vehicles: problems, datasets and state of the art. Comput. Graph. Vis. 12(1–3), 1–308 (2020)
Sun, P., Kretzschmar, H., Dotiwalla, X., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Oh, S., Hoogs, A., Perera, A., et al.: A large scale benchmark dataset for event recognition in surveillance video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. In: IEEE International Conference on Image Processing (ICIP) (2016)
Tang, S., Andriluka, M., Andres, B., et al.: Multiple people tracking by lifted multicut and person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Xu, J., Cao, Y., Zhang, Z., et al.: Spatial temporal relation networks for multi-object tracking. In: International Conference on Computer Vision (ICCV) (2019)
Porzi, L., Hofinger, M., Ruiz, I., et al.: Learning multi-object tracking and segmentation from automatic annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Meinhardt, T., Kirillov, A., Leal-Taixe, L., et al.: Trackformer: multi-object tracking with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Zhou, X., Koltun, V., Krahenbuhl, P.: Tracking objects as points. In: European Conference on Computer Vision (ECCV) (2020)
Peng, J., Wang, C., Wan, F., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision (ECCV) (2020)
Lu, L., Rathod, V., Votel, R., et al.: Retinatrack: online single stage joint detection and tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, Z., Zheng, L., Liu, Y., et al.: Towards real-time multi-object tracking. In: European Conference on Computer Vision (ECCV) (2020)
Milan, A., Leal-Taixe, L., Reid, I., et al.: MOT16: a benchmark for multi-object tracking. arXiv:1603.00831
Dendorfer, P., Rezatofighi, H., Milan, A., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 (2020)
Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. In: IEEE International Conference on Image Processing (ICIP) (2016)
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2017)
Yu, F., Li, W., Li, Q., et al: POI: multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision (ECCV) (2016)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: IEEE International Conference on Image Processing (ICIP) (2017)
Bochinski, E., Senst, T, Sikora, T.: Extending IoU based multi-object tracking by visual information. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018)
Zhang, Y., Wang, C., Wang, X., et al.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129, 3069–3087 (2021)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Wu, J., Cao, J., Song, L., et al.: Track to detect and segment: an online multi-object tracker. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Yu, F., Wang, D., Shelhamer, E., et al.: Deep layer aggregation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Lin, T.-Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV) (2017)
Lin, T.-Y., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV) (2014)
Pang, B., Li, Y., Zhang, Y., et al.: Tubetk: adopting tubes to track multi-object in a one-step training model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Han, S., Huang, P., Wang, H., et al.: MAT: motion-aware multi-object tracking. Neurocomputing 476, 75–86 (2022)
Pang, J., Qiu, L., Li, X., et al.: Quasi-dense similarity learning for multiple object tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Zeng, F., Dong, B., Wang, T., et al.: MOTR: end-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision (ECCV) (2022)
Zhang, Y., Sheng, H., Wu, Y., et al.: Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J. 7, 7892–7902 (2020)
Xu, Y., Ban, Y., Delorme, G., et al.: Transcenter: transformers with dense queries for multiple-object tracking. arXiv:2103.15145 (2021)
Yu, E., Li, Z., Han, S., et al.: Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Trans. Multimed. (2022)
Shao, S., Zhao, Z., Li, B., et al.: Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123 (2018)
Fabbri, M., Lanzi, F., Calderara, S., et al.: Learning to detect and track visible and occluded body joints in a virtual world. In: European Conference on Computer Vision (ECCV) (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, S., Li, WX., Wang, L. et al. VGT-MOT: visibility-guided tracking for online multiple-object tracking. Machine Vision and Applications 34, 50 (2023). https://doi.org/10.1007/s00138-023-01398-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01398-y