Skip to main content

Advertisement

Log in

Benchmarking the Complementary-View Multi-human Association and Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Using multiple moving cameras with different and time-varying views can significantly expand the capability of multiple human tracking in larger areas and with various perspectives. In particular, the use of moving cameras of complementary top and horizontal views can facilitate multi-human detection and tracking from both global and local perspectives. As a new challenging problem that draws more and more attention in recent years, one main issue is the lack of a comprehensive dataset for credible performance evaluation. In this paper, we present such a new dataset consisting of videos synchronously recorded by drone and wearable cameras, with high-quality annotations of the covered subjects and their cross-frame and cross-view associations. We also propose a pertinent baseline algorithm for multi-view multiple human tracking and evaluate it on this new dataset against the annotated ground truths. Experimental results verify the usefulness of the new dataset and the effectiveness of the proposed baseline algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ardeshir, S., & Borji, A. (2016). Ego2top: Matching viewers in egocentric and top-view videos. In European conference on computer vision.

  • Ardeshir, S., & Borji, A. (2018). Egocentric meets top-view. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6), 1353–1366.

    Article  Google Scholar 

  • Ardeshir, S., & Borji, A. (2018b). Integrating egocentric videos in top-view surveillance videos: Joint identification and temporal alignment. In European conference on computer vision.

  • Bergmann, P., Meinhardt, T., Leal-Taixe, L. (2019a). Tracking without bells and whistles. In IEEE international conference on computer vision.

  • Bergmann, P., Meinhardt, T., Leal-Taixé, L. (2019b). Tracking without bells and whistles. In IEEE international conference on computer vision.

  • Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance. Journal on Image and Video Processing.

  • Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2011).

  • Brasó, G., Leal-Taixé, L. (2020). Learning a neural solver for multiple object tracking. In IEEE conference on computer vision and pattern recognition.

  • Cai, J. F., Candès, E., Shen, Z. (2018). A singular value thresholding algorithm for matrix completio. SIAM Journal on Optimization.

  • Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., Yu, N. (2017). Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In IEEE international conference on computer vision.

  • Dehghan, A., Assari, S. M., Shah, M. (2015). GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In IEEE conference on computer vision and pattern recognition.

  • Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X. (2019). Fast and robust multi-person 3d pose estimation from multiple views. In IEEE conference on computer vision and pattern recognition.

  • Dong, J., Fang, Q., Jiang, W., Yang, Y., Huang, Q., Bao, H., Zhou, X. (2021). Fast and robust multi-person 3d pose estimation and tracking from multiple views. IEEE transactions on pattern analysis and machine intelligence.

  • Dong, J., Fang, Q., Jiang, W., Yang, Y., Huang, Q., Bao, H., & Zhou, X. (2022). Fast and robust multi-person 3d pose estimation and tracking from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6981–6992.

    Article  Google Scholar 

  • Du, Y., Song, Y., Yang, B., Zhao, Y. (2022). Strongsort: Make deepsort great again. In arXiv

  • Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., Cucchiara, R. (2018). Learning to detect and track visible and occluded body joints in a virtual world. In European conference on computer vision.

  • Fabbri, M., Brasó, G., Maugeri, G., Cetintas, O., Gasparini, R., Os̃ep, A., Calderara, S., Leal-Taixé, L., Cucchiara, R. (2021). Motsynth: How can synthetic data help pedestrian detection and tracking? In IEEE/CVF international conference on computer vision.

  • Ferryman, J. (2009). An overview of the pets2009 challenge. In Proceedings of international workshop on pets.

  • Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2008). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 267.

    Article  Google Scholar 

  • Gaidon, A., Wang, Q., Cabon, Y., Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Gan, Y., Han, R., Yin, L., Feng, W., Wang, S. (2021). Self-supervised multi-view multi-human association and tracking. In ACM multimedia.

  • Gantmakher, F. R. (1959). The theory of matrices. American Mathematical Society.

  • Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J. (2021). Yolox: Exceeding yolo series in 2021. In arXiv

  • Han, R., Zhang, Y., Feng, W., Gong, C., Zhang, X., Zhao, J., Wan, L., Wang, S. (2019). Multiple human association between top and horizontal views by matching subjects’ spatial distributions. In arXiv

  • Han, R., Feng, W., Zhao, J., Niu, Z., Zhang, Y., Wan, L., Wang, S. (2020a). Complementary-view multiple human tracking. In AAAI conference on artificial intelligence.

  • Han, R., Zhao, J., Feng, W., Gan, Y., Wan, L., Wang, S. (2020b). Complementary-view co-interest person detection. In ACM international conference on multimedia.

  • Han, R., Feng, W., Zhang, Y., Zhao, J., & Wang, S. (2022). Multiple human association and tracking from egocentric and complementary top views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5225–5242.

    Google Scholar 

  • Han, R., Wang, Y., Yan, H., Feng, W., & Wang, S. (2022). Multi-view multi-human association with deep assignment network. IEEE Transactions on Image Processing, 31, 1830–1840.

    Article  Google Scholar 

  • He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T. (2020). Fastreid: A pytorch toolbox for general instance re-identification. In arXiv.

  • Kuo, C.H, Huang, C., Nevatia, R. (2010). Inter-camera association of multi-target tracks by on-line learned appearance affinity models. In European conference on computer vision.

  • Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. In arXiv.

  • Lealtaixe, L., Cantonferrer, C., Schindler, K. (2016). Learning by tracking: Siamese CNN for robust target association. In IEEE conference on computer vision and pattern recognition.

  • Liang, G., Lan, X., Zheng, K., Wang, S., Zheng, N. (2018). Cross-view person identification by matching human poses estimated with confidence on each body joint. In AAAI conference on artificial intelligence.

  • Liang, G., Lan, X., Chen, X., Zheng, K., Wang, S., & Zheng, N. (2019). Cross-view person identification based on confidence-weighted human pose matching. IEEE Transactions on Image Processing, 28(8), 3821–3835.

    Article  MathSciNet  Google Scholar 

  • Lin, Y., Ezzeldeen, K., Zhou, Y., Fan, X., Yu, H., Qian, H., Wang, S. (2015). Co-interest person detection from multiple wearable camera videos. In IEEE international conference on computer vision.

  • Luiten, J., Osep, A., Dendorfer, P., Torr, P., Leibe, B. (2020). HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, pp. 1–31.

  • Ma, F., Shou, M.Z., Zhu, L., Fan, H., Xu, Y., Yang, Y., Yan, Z. (2022). Unified transformer tracker for object tracking. In IEEE conference on computer vision and pattern recognition.

  • Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C. (2022). Trackformer: Multi-object tracking with transformers. In IEEE conference on computer vision and pattern recognition.

  • Ng, A. Y, Jordan, M. I., Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Conference on neural information processing systems.

  • Redmon, J., Divvala, S. K., Girshick, R. B., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In IEEE conference on computer vision and pattern recognition.

  • Ristani, E., Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. In IEEE conference on computer vision and pattern recognition.

  • Ristani, E., Solera, F., Zou, R.S., Cucchiara, R., Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In IEEE conference on computer vision and pattern recognition.

  • Sun, X., Zheng, L. (2020). Dissecting person re-identification from the viewpoint of viewpoint. In IEEE conference on computer vision and pattern recognition.

  • Wojke, N., Bewley, A., Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In IEEE international conference on image processing.

  • Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J. (2021). Track to detect and segment: An online multi-object tracker. In IEEE conference on computer vision and pattern recognition.

  • Xiang, Y., Alahi, A., Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In IEEE international conference on computer vision.

  • Xu, Y., Liu, X., Liu, Y., Zhu, S. (2016). Multi-view people tracking via hierarchical trajectory composition. In IEEE conference on computer vision and pattern recognition.

  • Xu, Y., Liu, X., Qin, L., Zhu, S. (2017). Cross-view people tracking by scene-centered spatio-temporal parsing. In AAAI conference on artificial intelligence.

  • Yang, B., & Nevatia, R. (2012a). Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In IEEE conference on computer vision and pattern recognition.

  • Yang, B., & Nevatia, R. (2012b). An online learned crf model for multi-target tracking. In IEEE conference on computer vision and pattern recognition.

  • Zamir, A. R, Dehghan, A., Shah, M. (2012). Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In European conference on computer vision.

  • Zelnik-Manor, L., Perona, P. (2004). Self-tuning spectral clustering. In Conference on neural information processing systems.

  • Zhang, S., Staudt, E., Faltemier, T., Roy-Chowdhury, A. K. (2015). A camera network tracking (CamNeT) dataset and performance baseline. In Winter conference on applications of computer vision.

  • Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129, 3069–3087.

    Article  Google Scholar 

  • Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X. (2022). Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision.

  • Zhao, J., Han, R., Gan, Y., Wan, L., Feng, W., Wang, S. (2020). Human identification and interaction detection in cross-view multi-person videos with wearable cameras. In ACM international conference on multimedia.

  • Zheng, K., Lin, Y., Zhou, Y., Salvi, D., Fan, X., Guo, D., Meng, Z., Wang, S. (2014). Video-based action detection using multiple wearable cameras. In European conference on computer vision workshop.

  • Zheng, K., Fan, X., Lin, Y., Guo, H., Wang, S. (2017). Learning view-invariant features for person identification in temporally synchronized videos taken by wearable cameras. In IEEE international conference on computer vision.

  • Zhou, X., Koltun, V., Krähenbühl, P. (2020). Tracking objects as points. In European conference on computer vision.

  • Zhou, X., Yin, T., Koltun, V., Krähenbühl, P. (2022). Global tracking transformers. In IEEE conference on computer vision and pattern recognition.

  • Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M. (2018). Online multi-object tracking with dual matching attention networks. In European conference on computer vision.

  • Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., & Ling, H. (2022). Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7380–7399.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants U1803264, 62072334. The authors would like to thank their team members, i.e., especially Jiewen Zhao for his technical assistance on implementation, and Liqiang Yin, Yiyang Gan, Yun Wang, Jiacheng Li, Sibo Wang, Shuai Wang, Songmiao Wang, and Likai Wang for their kind assistance in the collection and annotation of this dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Feng.

Additional information

Communicated by Matteo Poggi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, R., Feng, W., Wang, F. et al. Benchmarking the Complementary-View Multi-human Association and Tracking. Int J Comput Vis 132, 118–136 (2024). https://doi.org/10.1007/s11263-023-01857-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01857-z

Keywords

Navigation