Skip to main content

Recursive Bayesian Filtering for Multiple Human Pose Tracking from Multiple Cameras

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Abstract

Markerless motion capture allows the extraction of multiple 3D human poses from natural scenes, without the need for a controlled but artificial studio environment or expensive hardware. In this work we present a novel tracking algorithm which utilizes recent advancements in 2D human pose estimation as well as 3D human motion anticipation. During the prediction step we utilize an RNN to forecast a set of plausible future poses while we utilize a 2D multiple human pose estimation model during the update step to incorporate observations. Casting the problem of estimating multiple persons from multiple cameras as a tracking problem rather than an association problem results in a linear relationship between runtime and the number of tracked persons. Furthermore, tracking enables our method to overcome temporary occlusions by relying on the prediction model. Our approach achieves state-of-the-art results on popular benchmarks for 3D human pose estimation and tracking.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Assuming z axis points upwards.

References

  1. Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  2. Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. In: Transactions on Pattern Analysis and Machine Intelligence (2013)

    Google Scholar 

  3. Ershadi-Nasab, S., Noury, E., Kasaei, S., Sanaei, E.: Multiple human 3d pose estimation from multiview images. Multimedia Tools and Applications (2018)

    Google Scholar 

  4. Zhang, Y., An, L., Yu, T., Li, X., Li, K., Liu, Y.: 4D association graph for realtime multi-person motion capture using multiple video cameras. In: Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  5. Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X.: Fast and robust multi-person 3D pose estimation from multiple views. In: Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  6. Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  7. Belagiannis, V., Wang, X., Schiele, B., Fua, P., Ilic, S., Navab, N.: Multiple human pose estimation with temporally consistent 3D pictorial structures. In: European Conference on Computer Vision (2014)

    Google Scholar 

  8. Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3d pictorial structures revisited: Multiple human pose estimation. Transactions on Pattern Analysis and Machine Intelligence (2016)

    Google Scholar 

  9. Tanke, J., Gall, J.: Iterative greedy matching for 3d human pose tracking from multiple views. In: German Conference on Pattern Recognition (2019)

    Google Scholar 

  10. Burenius, M., Sullivan, J., Carlsson, S.: 3d pictorial structures for multiple view articulated pose estimation. In: Conference on Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  11. Kazemi, V., Burenius, M., Azizpour, H., Sullivan, J.: Multi-view body part recognition with random forests. In: British Machine Vision Conference (2013)

    Google Scholar 

  12. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: International Conference on Computer Vision (2015)

    Google Scholar 

  13. Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge (2013)

    Book  Google Scholar 

  14. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  15. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision (2016)

    Google Scholar 

  16. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European conference on Computer Vision (2018)

    Google Scholar 

  17. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  18. Insafutdinov, E., et al.: Arttrack: articulated multi-person tracking in the wild. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  19. Iqbal, U., Milan, A., Gall, J.: Posetrack: joint multi-person pose estimation and tracking. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  20. Doering, A., Rafi, U., Leibe, B., Gall, J.: Multiple human pose estimation with temporally consistent 3d pictorial structures. In: European Conference on Computer Vision (2020)

    Google Scholar 

  21. Doering, A., Iqbal, U., Gall, J.: Joint flow: temporal flow fields for multi person tracking. In: British Machine Vision Conference (2018)

    Google Scholar 

  22. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: International Conference on Computer Vision (2017)

    Google Scholar 

  23. Iqbal, U., Doering, A., Yasin, H., Krüger, B., Weber, A., Gall, J.: A dual-source approach for 3d human pose estimation from single images. Computer Vision and Image Understanding (2018)

    Google Scholar 

  24. Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5 d heatmap regression. In: European Conference on Computer Vision (2018)

    Google Scholar 

  25. Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3d human pose from images. In: British Machine Vision Conference (2014)

    Google Scholar 

  26. Mehta, D., et al.: Single-shot multi-person 3d pose estimation from monocular RGB. In: International Conference on 3D Vision (2018)

    Google Scholar 

  27. Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. Transactions on Pattern Analysis and Machine Intelligence (2017)

    Google Scholar 

  28. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  29. Deutscher, J., Reid, I.: Articulated body motion capture by stochastic search. Int. J. Comput. Vis. (2005)

    Google Scholar 

  30. Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. (2010)

    Google Scholar 

  31. Yao, A., Gall, J., Gool, L.V., Urtasun, R.: Learning probabilistic non-linear latent variable models for tracking complex activities. In: Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  32. Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. Transactions on Graphics (2016)

    Google Scholar 

  33. Bütepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  34. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  35. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: International Conference on Computer Vision (2015)

    Google Scholar 

  36. Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning (2016)

    Google Scholar 

  37. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. In: Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  38. Muñoz-Salinas, R., Medina-Carnicer, R., Madrid-Cuevas, F.J., Carmona-Poyato, A.: Particle filtering with multiple and heterogeneous cameras. In: Pattern Recognition (2010)

    Google Scholar 

  39. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. Transactions on Pattern Analysis and Machine Intelligence (2014)

    Google Scholar 

  40. CMU Mocap Database. http://mocap.cs.cmu.edu/ (0)

  41. Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. Pattern Analysis and Machine Intelligence (2007)

    Google Scholar 

  42. Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motion capture. In: International Conference on Computer Vision (2015)

    Google Scholar 

  43. Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3d pictorial structures revisited: multiple human pose estimation. Transactions on Pattern Analysis and Machine Intelligence (2015)

    Google Scholar 

  44. Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: International Workshop on Visual Surveillance (2006)

    Google Scholar 

  45. Lin, T.Y., et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision (2014)

    Google Scholar 

Download references

Acknowledgment

The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2070–390732324, GA 1927/8-1, and the ERC Starting Grant ARCA (677650).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Tanke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kwon, OH., Tanke, J., Gall, J. (2021). Recursive Bayesian Filtering for Multiple Human Pose Tracking from Multiple Cameras. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69532-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69531-6

  • Online ISBN: 978-3-030-69532-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics