research-article

MonoPerfCap: Human Performance Capture From Monocular Video

Authors:

Avishek Chatterjee,

Michael Zollhöfer,

Dushyant Mehta,

Hans-Peter Seidel,

Christian TheobaltAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 37, Issue 2

Article No.: 27, Pages 1 - 15

https://doi.org/10.1145/3181973

Published: 21 May 2018 Publication History

Abstract

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows us to resolve the ambiguities of the monocular reconstruction problem based on a low-dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness, and scene complexity that can be handled.

Supplementary Material

xu (xu.zip)

Supplemental movie and image files for, MonoPerfCap: Human Performance Capture From Monocular Video

Download
77.03 MB

MP4 File (tog37-2-a27-xu.mp4)

Download
295.39 MB

References

[1]

Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1446--1455.

[2]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).

Digital Library

[3]

Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3, 408--416.

Digital Library

[4]

Alexandru O. Balan, Leonid Sigal, Michael J. Black, James E. Davis, and Horst W. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.

[5]

A. Bartoli, Y. Gérard, F. Chadebecq, T. Collins, and D. Pizarro. 2015. Shape-from-template. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10, 2099--2118.

Digital Library

[6]

Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308.

Digital Library

[7]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).

[8]

Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Trans. Graph. 27, 99.

Digital Library

[9]

Matthieu Bray, Pushmeet Kohli, and Philip H. S. Torr. 2006. Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 642--655.

Digital Library

[10]

Thomas Brox, Bodo Rosenhahn, Daniel Cremers, and Hans-Peter Seidel. 2006. High-accuracy optical flow serves 3D pose tracking: Exploiting contour and flow-based constraints. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 98--111.

Digital Library

[11]

Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers. 2010. Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans. Pattern Anal. Mach. Intell. 32, 3, 402--415.

Digital Library

[12]

Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1339--1346.

[13]

Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3, 569--577.

Digital Library

[14]

Yu Chen, Tae-Kyun Kim, and Roberto Cipolla. 2010. Inferring 3D shapes and deformations from single views. In Proceedings of the European Conference on Computer Vision (ECCV’10). 300--313.

Digital Library

[15]

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, 69.

Digital Library

[16]

Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM Trans. Graph. 27, 98.

Digital Library

[17]

Mingsong Dou, Henry Fuchs, and Jan-Michael Frahm. 2013. Scanning and tracking dynamic objects with commodity depth cameras. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR’13). IEEE, Los Alamitos, CA, 99--106.

[18]

Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, 114.

Digital Library

[19]

Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.

[20]

Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 1746--1753.

[21]

R. Garg, A. Roussos, and L. Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279.

Digital Library

[22]

Pablo Garrido, Michael Zollhoefer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3 28:1--28:15.

Digital Library

[23]

Daniel Grest, Dennis Herzog, and Reinhard Koch. 2005. Human model fitting from monocular posture images. In Proceedings of the Conference on Vision, Modeling and Visualization (VMV’05).

[24]

Peng Guan, Alexander Weiss, Alexandru O Bălan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’09). 1381--1388.

[25]

Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3083--3091.

Digital Library

[26]

Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thormählen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1823--1830.

[27]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[28]

Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13).

Digital Library

[29]

Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the International Conference on 3D Vision (3DV’17).

[30]

Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In Computer Vision—ECCV 2016. Springer, 17.

[31]

Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1661--1668.

Digital Library

[32]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7, 1325--1339.

Digital Library

[33]

Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. 29, 5, Article 148.

Digital Library

[34]

Arjun Jain, Jonathan Tompson, Yann LeCun, and Christoph Bregler. 2014. Modeep: A deep learning framework using motion features for human pose estimation. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 302--315.

[35]

Sam Johnson and Mark Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[36]

Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D’07).

Digital Library

[37]

J. P. Lewis, Matt Cordner, and Nickson Fong. 2000. Pose Space Deformation: A unified approach to shape interpolation and skeleton-driven deformation. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). 165--172.

Digital Library

[38]

Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28, 5, Article 175.

Digital Library

[39]

Sijin Li and Antoni B Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 332--347.

[40]

Sijin Li, Weichen Zhang, and Antoni B Chan. 2015. Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 2848--2856.

Digital Library

[41]

Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, Los Alamitos, CA, 1249--1256.

Digital Library

[42]

Matthew Loper, Naureen Mahmood, and Michael J. Black. 2014. MoSh: Motion and shape capture from sparse markers. ACM Trans. Graph. 33, 6, 220.

Digital Library

[43]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6, Article 248.

Digital Library

[44]

Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369--374.

Digital Library

[45]

Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. arXiv:1611.09813.

[46]

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, 14.

Digital Library

[47]

Greg Mori and Jitendra Malik. 2006. Recovering 3D human body configurations using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7, 1052--1062.

Digital Library

[48]

Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, and Adrian Hilton. 2015. General dynamic scene reconstruction from multiple view video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15).

Digital Library

[49]

Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).

[50]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. arXiv:1603.06937.

[51]

Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vision 115, 2, 115--135.

Digital Library

[52]

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Coarse-to-fine volumetric prediction for single-image 3D human pose. arXiv:1611.07828.

[53]

Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[54]

Ralf Plänkers and Pascal Fua. 2001. Tracking and modeling people in video sequences. Comput. Vision Image Understand. 81, 3, 285--302.

Digital Library

[55]

Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016. General automatic human shape and motion capture using volumetric contour cues. In Proceedings of the European Conference on Computer Vision (ECCV’16). 509--526.

[56]

Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based outdoor performance capture. In Proceedings of the International Conference on Computer Vision (3DV’16).

[57]

Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Trans. Graph. 34, 1, 6.

Digital Library

[58]

Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3, 251--276.

Digital Library

[59]

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309--314.

Digital Library

[60]

Chris Russell, Rui Yu, and Lourdes Agapito. 2014. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes. Springer International Publishing, Cham, 583--598.

[61]

Mathieu Salzmann and Pascal Fua. 2011. Linear local models for monocular reconstruction of deformable surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5, 931--944.

Digital Library

[62]

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1297--1304.

Digital Library

[63]

Hedvig Sidenbladh, Michael J. Black, and David J. Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European Conference on Computer Vision (ECCV’00). 702--718.

Digital Library

[64]

Leonid Sigal, Alexandru Balan, and Michael J. Black. 2007. Combined discriminative and generative articulated pose and non-rigid shape estimation. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 1337--1344.

Digital Library

[65]

Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, and Francesc Moreno-Noguer. 2012. Single image 3D human pose estimation from noisy observations. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 2673--2680.

Digital Library

[66]

Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 1743--1752.

Digital Library

[67]

Cristian Sminchisescu and Bill Triggs. 2003a. Estimating articulated human motion with covariance scaled sampling. Int. J. Robot. Res. 22, 6, 371--391.

[68]

Cristian Sminchisescu and Bill Triggs. 2003b. Kinematic jump processes for monocular 3D human tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol. 1. IEEE, Los Alamitos, CA, I--69.

Digital Library

[69]

Dan Song, Ruofeng Tong, Jian Chang, Xiaosong Yang, Min Tang, and Jian Jun Zhang. 2016. 3D body shapes estimation from dressed-human silhouettes. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 147--156.

Digital Library

[70]

Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the 5th Eurographics Symposium on Geometry Processing (SGP’07).

Digital Library

[71]

Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3, 21--31.

Digital Library

[72]

Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). 951--958.

Digital Library

[73]

Robert W. Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3, 80.

Digital Library

[74]

Camillo J. Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’00), Vol. 1. 677--684.

[75]

Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured prediction of 3D human pose with deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC’16).

[76]

J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA.

[77]

Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). 1653--1660.

Digital Library

[78]

Raquel Urtasun, David J. Fleet, and Pascal Fua. 2005. Monocular 3D tracking of the golf swing. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’05). 932--938.

Digital Library

[79]

Raquel Urtasun, David J. Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3D human body tracking. Comput. Vision Image Understand. 104, 2, 157--177.

Digital Library

[80]

Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97.

Digital Library

[81]

Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5, 174.

Digital Library

[82]

Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L. Yuille, and Wen Gao. 2014. Robust estimation of 3D human poses from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2361--2368.

Digital Library

[83]

Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).

[84]

Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Visual Comput. 21, 8--10, 629--638.

[85]

Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).

[86]

Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Trans. Graph. 29, 42.

Digital Library

[87]

Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19, 7, 780--785.

Digital Library

[88]

Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 161:1--161:11.

Digital Library

[89]

Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In Proceedings of the European Conference on Computer Vision (ECCV’12). 757--770.

Digital Library

[90]

Weipeng Xu, Mathieu Salzmann, Yongtian Wang, and Yue Liu. 2015. Deformable 3D fusion: From partial dynamic 3D observations to complete 4D models. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2183--2191.

Digital Library

[91]

Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).

[92]

Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In Proceedings of the European Conference on Computer Vision (ECCV’12), Vol. 7573 LNCS. 828--841.

[93]

Rui Yu, Chris Russell, Neill D. F. Campbell, and Lourdes Agapito. 2015. Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15).

Digital Library

[94]

Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683.

Digital Library

[95]

Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) 29, 4 (2010), 126.

Digital Library

[96]

Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4447--4455.

[97]

Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016a. Deep kinematic pose regression. arXiv Preprint arXiv:1609.05317 (2016).

[98]

Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016b. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.

[99]

Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4, Article 156.

Digital Library

Cited By

Dibenedetto GSotiropoulos SPolignano MCavallo GLops P(2025)Comparing human pose estimation through deep learning approaches: An overviewComputer Vision and Image Understanding10.1016/j.cviu.2025.104297(104297)Online publication date: Jan-2025
https://doi.org/10.1016/j.cviu.2025.104297
Benshuang Chen Shao LChen X(2024)Template-Free Neural Representations for Novel View Synthesis of HumansAutomatic Control and Computer Sciences10.3103/S014641162470116558:6(705-713)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.3103/S0146411624701165
Zhang JMao QHu GShen SWang CWooldridge MDy JNatarajan S(2024)Neighborhood-enhanced 3D human pose estimation with monocular LiDAR in long-range outdoor scenesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i7.28545(7169-7177)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i7.28545
Show More Cited By

Index Terms

MonoPerfCap: Human Performance Capture From Monocular Video
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

LiveCap: Real-Time Human Performance Capture From Monocular Video

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-...
Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning
Abstract
Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-...
Tracking the articulated motion of the human body with two RGBD cameras

We present a model-based, top-down solution to the problem of tracking the 3D position, orientation and full articulation of the human body from markerless visual observations obtained by two synchronized RGBD cameras. Inspired by recent advances to the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 37, Issue 2

April 2018

244 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3191713

Editor:
Kavita Bala
Cornell University

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2018

Accepted: 01 February 2018

Revised: 01 February 2018

Received: 01 September 2017

Published in TOG Volume 37, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

ERC Starting Grant project CapReal

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

164
Total Citations
View Citations
996
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dibenedetto GSotiropoulos SPolignano MCavallo GLops P(2025)Comparing human pose estimation through deep learning approaches: An overviewComputer Vision and Image Understanding10.1016/j.cviu.2025.104297(104297)Online publication date: Jan-2025
https://doi.org/10.1016/j.cviu.2025.104297
Benshuang Chen Shao LChen X(2024)Template-Free Neural Representations for Novel View Synthesis of HumansAutomatic Control and Computer Sciences10.3103/S014641162470116558:6(705-713)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.3103/S0146411624701165
Zhang JMao QHu GShen SWang CWooldridge MDy JNatarajan S(2024)Neighborhood-enhanced 3D human pose estimation with monocular LiDAR in long-range outdoor scenesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i7.28545(7169-7177)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i7.28545
Luo CLuo FWang YZhao EXiao CWooldridge MDy JNatarajan S(2024)DLCA-reconProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i4.28189(3963-3971)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i4.28189
Lee SHwang SOakley ILee K(2024)Expanding the Design Space of Vision-based Interactive Systems for Group Dance PracticeProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661568(2768-2787)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661568
Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Po RYifan WGolyanik VAberman KBarron JBermano AChan EDekel THolynski AKanazawa ALiu CLiu LMildenhall BNießner MOmmer BTheobalt CWonka PWetzstein G(2024)State of the Art on Diffusion Models for Visual ComputingComputer Graphics Forum10.1111/cgf.1506343:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15063
Okuyama YEndo YKanamori Y(2024)DiffBody: Diffusion-based Pose and Shape Editing of Human Images2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00621(6321-6330)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00621
Karthikeyan ARen RKant YGilitschenski I(2024)AvatarOne: Monocular 3D Human Animation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00361(3635-3645)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00361
Chen JVongkulbhisal JDe La Torre Frade F(2024)A Sequential Learning-based Approach for Monocular Human Performance Capture2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00348(3502-3511)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00348
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents