research-article

Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles

Authors:
Tobias Nägeli

AIT Lab, ETH Zurich

AIT Lab, ETH Zurich
View Profile

,
Samuel Oberholzer

AIT Lab, ETH Zurich

AIT Lab, ETH Zurich
View Profile

,
Silvan Plüss

AIT Lab, ETH Zurich

AIT Lab, ETH Zurich
View Profile

,
Javier Alonso-Mora

Delft University of Technology

Delft University of Technology
View Profile

,
Otmar Hilliges

AIT Lab, ETH Zurich

AIT Lab, ETH Zurich
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 37 Issue 6Article No.: 182pp 1–14https://doi.org/10.1145/3272127.3275022

Published:04 December 2018Publication History

ACM Transactions on Graphics

Abstract

We propose a real-time method for the infrastructure-free estimation of articulated human motion. The approach leverages a swarm of camera-equipped flying robots and jointly optimizes the swarm's and skeletal states, which include the 3D joint positions and a set of bones. Our method allows to track the motion of human subjects, for example an athlete, over long time horizons and long distances, in challenging settings and at large scale, where fixed infrastructure approaches are not applicable. The proposed algorithm uses active infra-red markers, runs in real-time and accurately estimates robot and human pose parameters online without the need for accurately calibrated or stationary mounted cameras. Our method i) estimates a global coordinate frame for the MAV swarm, ii) jointly optimizes the human pose and relative camera positions, and iii) estimates the length of the human bones. The entire swarm is then controlled via a model predictive controller to maximize visibility of the subject from multiple viewpoints even under fast motion such as jumping or jogging. We demonstrate our method in a number of difficult scenarios including capture of long locomotion sequences at the scale of a triplex gym, in non-planar terrain, while climbing and in outdoor scenarios.

Supplemental Material

a182-nageli.mp4

mp4

241.1 MB

Download

References

2015. Parrot SDK. (2015). http://developer.parrot.com/.Google Scholar
Javier Alonso-Mora, Eduardo Montijano, Tobias Nägeli, Otmar Hilliges, Mac Schwager, and Daniela Rus. 2018. Distributed multi-robot formation control in dynamic environments. Autonomous Robots (July 2018).Google Scholar
Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. Computer Vision-ECCV 2012 (2012), 640--653. Google ScholarDigital Library
Meysam Basiri, Felix Schill, Dario Floreano, and Pedro Lima. 2013. Audio-based relative positioning system for multiple micro air vehicle systems. In Robotics: Science and Systems RSS2013.Google Scholar
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561--578.Google ScholarCross Ref
Christoph Bregler and Jitendra Malik. 1998. Tracking people with twists and exponential maps. In Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on. IEEE, 8--15. Google ScholarDigital Library
Pierre-Jean Bristeau, François Callou, David Vissière, and Nicolas Petit. 2011. The Navigation and Control technology inside the AR.Drone micro UAV. IFAC Proceedings Volumes 44, 1 (2011), 1477 -- 1484. 18th IFAC World Congress.Google ScholarCross Ref
J A Castellanos, Jose Neira, and Juan Domingo Tardos. 2004. Limits to the consistency of EKF-based SLAM. (2004).Google Scholar
Xianjie Chen and Alan L Yuille. 2014. Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS. 1736--1744. Google ScholarDigital Library
Edilson de Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance Capture from Sparse Multi-view Video. In ACM SIGGRAPH 2008 Papers (SIGGRAPH '08). ACM, New York, NY, USA, Article 98, 10 pages. Google ScholarDigital Library
N. de Palézieux, T. Nägeli, and O. Hilliges. 2016. Duo-VIO: Fast, light-weight, stereo inertial odometry. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2237--2242.Google Scholar
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time Performance Capture of Challenging Scenes. ACM Trans. Graph. 35, 4, Article 114 (July 2016), 13 pages. Google ScholarDigital Library
Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, J Thompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2017. MARCOnI---ConvNet-Based MARker-Less Motion Capture in Outdoor and Indoor Scenes. IEEE transactions on pattern analysis and machine intelligence 39, 3 (2017), 501--514. Google ScholarDigital Library
B. Friedland. 1969. Treatment of bias in recursive filtering. IEEE Trans. Automat. Control 14, 4 (August 1969), 359--367.Google ScholarCross Ref
Varun Ganapathi, Christian Plagemann, Daphne Koller, and Sebastian Thrun. 2012. Real-time human pose tracking from range data. In European conference on computer vision. Springer, 738--751. Google ScholarDigital Library
Christoph Gebhardt, Benjamin Hepp, Tobias Nägeli, Stefan Stevšić, and Otmar Hilliges. 2016. Airways: Optimization-Based Planning of Quadrotor Trajectories According to High-Level User Goals. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2508--2519. Google ScholarDigital Library
Christoph Gebhardt, Stefan Stevsic, and Otmar Hilliges. 2018. Optimizing for Aesthetically Pleasing Quadrotor Camera Motion. ACM Trans. Graph. 37, 4, Article 90 (2018), 11 pages. Google ScholarDigital Library
Bruce P. Gibbs. 2011. Advanced Kalman filtering, least-squares and modeling. John Wiley & Sons.Google Scholar
Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision (2 ed.). Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
Chien-Shu Hsieh. 2000. Robust two-stage Kalman filters for systems with unknown inputs. IEEE Trans. Automat. Control 45, 12 (2000), 2374--2378.Google ScholarCross Ref
Chong Huang, Zhenyu Yang, Yan Kong, Peng Chen, Xin Yang, and Kwang-Ting Tim Cheng. 2018. Through-the-Lens Drone Filming. (2018).Google Scholar
Niels Joubert, Mike Roberts, Anh Truong, Floraine Berthouzoz, and Pat Hanrahan. 2015. An Interactive Tool for Designing Quadrotor Camera Shots. ACM Trans. Graph. 34, 6, Article 238, 11 pages. Google ScholarDigital Library
Ern J Lefferts, F Landis Markley, and Malcolm D Shuster. 1982. Kalman filtering for spacecraft attitude estimation. Journal of Guidance, Control, and Dynamics (1982).Google Scholar
Rui Li, Minjian Pang, Cong Zhao, Guyue Zhou, and Lu Fang. 2016. Monocular long-term target following on uavs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 29--37.Google ScholarCross Ref
Hyon Lim and Sudipta Sinha. 2015. Monocular Localization of a moving person onboard a Quadrotor MAV. https://www.microsoft.com/en-us/research/publication/trajrecon/Google Scholar
Huajun Liu, Xiaolin Wei, Jinxiang Chai, Inwoo Ha, and Taehyun Rhee. 2011. Realtime human motion control with a small number of inertial sensors. In Symposium on Interactive 3D Graphics and Games. ACM, 133--140. Google ScholarDigital Library
S. Lupashin, A. Schollig, M. Hehn, and R. D'Andrea. 2011. The Flying Machine Arena as of 2010. In IEEE ICRA '11. 2970--2971.Google ScholarCross Ref
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. ACM Transactions on Graphics 36, 4, 14. Google ScholarDigital Library
Nathan Michael, D. Mellinger, Q. Lindsey, and V. Kumar. 2010. The GRASP Multiple Micro-UAV Testbed. Robotics Automation Magazine, IEEE 17, 3 (2010), 56--65.Google Scholar
Thomas B Moeslund, Adrian Hilton, and Volker Krüger. 2006. A survey of advances in vision-based human motion capture and analysis. Computer vision and image understanding 104, 2 (2006), 90--126. Google ScholarDigital Library
T. Naegeli, J. Alonso-Mora, A. Domahidi, D. Rus, and O. Hilliges. 2017. Real-time Motion Planning for Aerial Videography with Dynamic Obstacle Avoidance and Viewpoint Optimization. IEEE Robotics and Automation Letters 2, 3 (2017), 1696--1703.Google ScholarCross Ref
Tobias Nägeli, Christian Conte, Alexander Domahidi, Manfred Morari, and Otmar Hilliges. 2014. Environment-independent formation flight for micro aerial vehicles. In Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on. IEEE, 1141--1146.Google ScholarCross Ref
Tobias Nägeli, Lukas Meier, Alexander Domahidi, Javier Alonso-Mora, and Otmar Hilliges. 2017. Real-time Planning for Automated Multi-view Drone Cinematography. ACM Trans. Graph. 36, 4, Article 132 (July 2017), 10 pages. Google ScholarDigital Library
Richard A Newcombe, Dieter Fox, and Steven M Seitz. 2015. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition. 343--352.Google ScholarCross Ref
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483--499.Google Scholar
Iasonas Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2012. Tracking the articulated motion of two strongly interacting hands. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 1862--1869. Google ScholarDigital Library
Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael J. Black. 2015. Dyna: A Model of Dynamic Human Shape in Motion. ACM Trans. Graph. 34, 4, Article 120 (July 2015), 14 pages. Google ScholarDigital Library
Jim Pugh and Alcherio Martinoli. 2006. Relative localization and communication module for small-scale multi-robot systems. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on. IEEE, 188--193.Google ScholarCross Ref
Morgan Quigley, Ken Conley, Brian P. Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y. Ng. 2009. ROS: an open-source Robot Operating System. In IEEE ICRA Workshop on Open Source Software.Google Scholar
Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2015. A versatile scene model with differentiable visibility applied to generative pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 765--773. Google ScholarDigital Library
Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based Outdoor Performance Capture. In Proceedings of the 2016 International Conference on 3D Vision (3DV 2016). http://gvv.mpi-inf.mpg.de/projects/OutdoorPerfcap/Google ScholarCross Ref
Mike Roberts and Pat Hanrahan. 2016. Generating Dynamically Feasible Trajectories for Quadrotor Cameras. ACM Trans. Graph. 35, 4, Article 61 (July 2016), 11 pages. Google ScholarDigital Library
Daniel Roetenberg, Henk Luinge, and Per Slycke. 2007. Moven: Full 6dof human motion tracking using miniature inertial sensors. Xsen Technologies, December 2, 3 (2007), 8.Google Scholar
S. I. Roumeliotis, G. S. Sukhatme, and G. A. Bekey. 1999. Circumventing dynamic modeling: evaluation of the error-state Kalman filter applied to mobile robot localization. In Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C), Vol. 2. 1656--1663 vol.2.Google Scholar
Loren Schwarz, Diana Mateus, and Nassir Navab. 2009. Discriminative human full-body pose estimation from wearable inertial sensor data. Modelling the Physiological Human (2009), 159--172. Google ScholarDigital Library
Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (2013), 116--124. Google ScholarDigital Library
Jie Song, Limin Wang, Luc Van Gool, and Otmar Hilliges. 2017. Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos. arXiv preprint arXiv:1703.10898 (2017).Google Scholar
Jonathan Starck and Adrian Hilton. 2003. Model-based multiple view reconstruction of people. In null. IEEE, 915. Google ScholarDigital Library
Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of gaussians body model. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 951--958. Google ScholarDigital Library
Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics (TOG) 30, 3 (2011), 18. Google ScholarDigital Library
Jonathan Taylor, Jamie Shotton, Toby Sharp, and Andrew Fitzgibbon. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 103--110. Google ScholarDigital Library
Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2016. Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation. arXiv preprint arXiv:1611.05708 (2016).Google Scholar
Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In NIPS. 1799--1807. Google ScholarDigital Library
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In CVPR. 1653--1660. Google ScholarDigital Library
T. von Marcard, B. Rosenhahn, M. J. Black, and G. Pons-Moll. 2017. Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs. Comput. Graph. Forum 36, 2 (may 2017), 349--360. Google ScholarDigital Library
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In CVPR. 4724--4732.Google Scholar
Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. 2017. FlyCap: Markerless motion capture using multiple autonomous flying cameras. IEEE transactions on visualization and computer graphics (2017).Google Scholar
Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.Google ScholarCross Ref
Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, et al. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics (TOG) 33, 4 (2014), 156. Google ScholarDigital Library

Index Terms

Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Online Marker-Free Extrinsic Camera Calibration Using Person Keypoint Detections
Pattern Recognition
Abstract
Calibration of multi-camera systems, i.e. determining the relative poses between the cameras, is a prerequisite for many tasks in computer vision and robotics. Camera calibration is typically achieved using offline methods that use checkerboard ...
Read More
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
Computer Vision – ECCV 2022
Abstract
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, ...
Read More
Consensus-Based Optimization for 3D Human Pose Estimation in Camera Coordinates
Abstract
3D human pose estimation is frequently seen as the task of estimating 3D poses relative to the root body joint. Alternatively, we propose a 3D human pose estimation method in camera coordinates, which allows effective combination of 2D annotated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 37, Issue 6
December 2018
1401 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3272127
Editor:
Takeo Igarashi
The University of Tokyo, Japan
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2018
Published in tog Volume 37, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
human pose estimation
robotics
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 536
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles

ACM Transactions on Graphics

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Online Marker-Free Extrinsic Camera Calibration Using Person Keypoint Detections

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

Consensus-Based Optimization for 3D Human Pose Estimation in Camera Coordinates