skip to main content
research-article

Motion2fusion: real-time volumetric performance capture

Published: 20 November 2017 Publication History

Abstract

We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. We provide three major contributions over prior work: 1) a new non-rigid fusion pipeline allowing for far more faithful reconstruction of high frequency geometric details, avoiding the over-smoothing and visual artifacts observed previously. 2) a high speed pipeline coupled with a machine learning technique for 3D correspondence field estimation reducing tracking errors and artifacts that are attributed to fast motions. 3) a backward and forward non-rigid alignment strategy that more robustly deals with topology changes but is still free from scene priors. Our novel performance capture system demonstrates real-time results nearing 3x speed-up from previous state-of-the-art work on the exact same GPU hardware. Extensive quantitative and qualitative comparisons show more precise geometric and texturing results with less artifacts due to fast motions or topology changes than prior art.

Supplementary Material

MP4 File (a246-dou.mp4)

References

[1]
Christian Bailer, Bertram Taetz, and Didier Stricker. 2015. Flow Fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. In Proceedings of the IEEE International Conference on Computer Vision. 4015--4023.
[2]
Ilya Baran and Jovan Popović. 2007. Automatic Rigging and Animation of 3D Characters. ACM TOG 26, 3 (2007), 72.
[3]
Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4 (2015), 46.
[4]
Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D Shape Regression for Real-time Facial Animation. ACM TOG 32, 4, Article 41 (2013), 10 pages.
[5]
Nathan A Carr and John C Hart. 2002. Meshed atlases for real-time procedural solid texturing. ACM TOG 21, 2 (2002), 106--131.
[6]
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015a. High-quality Streamable Free-viewpoint Video. ACM TOG (2015).
[7]
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015b. High-quality streamable free-viewpoint video. ACM TOG 34, 4 (2015), 69.
[8]
Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. 303--312.
[9]
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time Performance Capture of Challenging Scenes. ACM TOG 35, 4 (2016), 114.
[10]
Sean Ryan Fanello, Cem Keskin, Shahram Izadi, Pushmeet Kohli, David Kim, David Sweeney, Antonio Criminisi, Jamie Shotton, Sing Bing Kang, and Tim Paek. 2014. Learning to be a depth camera for close-range human capture and interaction. In ACM Transactions on Graphics (TOG).
[11]
Sean Ryan Fanello, Christoph Rhemann, Vladimir Tankovich, A Kowdle, S Orts Escolano, D Kim, and S Izadi. 2016. Hyperdepth: Learning depth from structured light without matching. In CVPR.
[12]
Sean Ryan Fanello, Julien Valentin, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, Carlo Ciliberto, Philip Davidson, and Shahram Izadi. 2017a. Low Compute and Fully Parallel Computer Vision with HashMatch. In ICCV.
[13]
Sean Ryan Fanello, Julien Valentin, Christoph Rhemann, Adarsh Kowdle, Vladimir Tankovich, Philip Davidson, and Shahram Izadi. 2017b. UltraStereo: Efficient Learning-based Matching for Active Stereo Systems. In CVPR.
[14]
Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazirbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. In ICCV. 2758--2766.
[15]
Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust Non-Rigid Motion Tracking and Surface Reconstruction Using L0 Regularization. In ICCV. 3083--3091.
[16]
Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu. 2017. Real-time Geometry, Albedo and Motion Reconstruction Using a Single RGBD Camera. ACM Transactions on Graphics (TOG) (2017).
[17]
Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In ECCV. 362--379.
[18]
Varun Jain and Hao Zhang. 2006. Robust 3D Shape Correspondence in the Spectral Domain. In SMA. 19--19.
[19]
Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O'Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 symposium on Interactive 3D graphics and games. ACM, 39--46.
[20]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In CVPR, Vol. 2. 2169--2178.
[21]
Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An Integer Projected Fixed Point Method for Graph Matching and MAP Inference. In NIPS.
[22]
Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. 2002. Least squares conformal maps for automatic texture atlas generation. ACM TOG 21, 3 (2002), 362--371.
[23]
Hao Li, Bart Adams, Leonidas J Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 175.
[24]
Hao Li, Etienne Vouga, Anton Gudym, Linjie Luo, Jonathan T Barron, and Gleb Gusev. 2013. 3D self-portraits. ACM Transactions on Graphics (TOG) 32, 6 (2013), 187.
[25]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. ACM Transactions on Graphics 36, 4, 14.
[26]
Mark Meyer, Mathieu Desbrun, Peter Schröder, and Alan H Barr. 2002. Discrete differential-geometry operators for triangulated 2-manifolds. Visualization and mathematics 3, 2 (2002), 52--58.
[27]
Richard A Newcombe, Dieter Fox, and Steven M Seitz. 2015. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition. 343--352.
[28]
Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on. IEEE, 127--136.
[29]
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741--754.
[30]
Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2016. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In CVPR.
[31]
Ali Rahimi and Benjamin Recht. 2007. Random Features for Large-scale Kernel Machines. In NIPS. 5.
[32]
Pedro V Sander, John Snyder, Steven J Gortler, and Hugues Hoppe. 2001. Texture mapping progressive meshes. In SIGGRAPH. ACM, 409--416.
[33]
Alla Sheffer and John C Hart. 2002. Seamster: inconspicuous low-distortion texture seam layout. In Visualization. 291--298.
[34]
Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (2013), 116--124.
[35]
Marc Soucy, Guy Godin, and Marc Rioux. 1996. A texture-mapping approach for the compression of colored 3D triangulations. The Visual Computer 12, 10 (1996), 503--514.
[36]
Robert W Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM TOG 26, 3 (2007), 80.
[37]
David Joseph Tan, Thomas Cashman, Jonathan Taylor, Andrew Fitzgibbon, Daniel Tarlow, Sameh Khamis, Shahram Izadi, and Jamie Shotton. 2016. Fits Like a Glove: Rapid and Reliable Hand Shape Personalization. In IEEE Conference on Computer Vision and Pattern Recognition.
[38]
Jonathan Taylor, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, Arran Topalian, Erroll Wood, Sameh Khamis, Pushmeet Kohli, Shahram Izadi, Richard Banks, Andrew Fitzgibbon, and Jamie Shotton. 2016. Efficient and Precise Interactive Hand Tracking Through Joint, Continuous Optimization of Pose and Correspondences. SIGGRAPH (2016).
[39]
C. Theobalt, E. de Aguiar, C. Stoll, H.-P. Seidel, and S. Thrun. 2010. Performance Capture from Multi-view Video. In Image and Geometry Processing for 3D-Cinematography, R. Ronfard and G. Taubin (Eds.). Springer, 127ff.
[40]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In CVPR.
[41]
Shenlong Wang, Sean Ryan Fanello, Christoph Rhemann, Shahram Izadi, and Pushmeet Kohli. 2016. The Global Patch Collider. In CVPR. 127--135.
[42]
Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. 2008. A fast randomized algorithm for the approximation of matrices. Applied and Computational Harmonic Analysis 25, 3 (2008), 335--366.
[43]
Jin Xie, Yi Fang, Fan Zhu, and Edward Wong. 2015. Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1275--1283.
[44]
Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In CVPR. IEEE.
[45]
Mao Ye, Qing Zhang, Liang Wang, Jiejie Zhu, Ruigang Yang, and Juergen Gall. 2013. A survey on human motion analysis from depth data. In Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. Springer, 149--187.
[46]
Sergey Zagoruyko and Nikos Komodakis. 2015. Learning to Compare Image Patches via Convolutional Neural Networks. In CVPR. 4353--4361.
[47]
Mikhail Zaslavskiy, Francis Bach, and Jean-Philippe Vert. 2009. A Path Following Algorithm for the Graph Matching Problem. PAMI 31, 12 (2009), 2227--2242.
[48]
Jure Žbontar and Yann LeCun. 2015. Computing the stereo matching cost with a convolutional neural network. In CVPR. 1592--1599.
[49]
F. Zhou and F. De la Torre. 2012. Factorized graph matching. In CVPR. 127--134.
[50]
Kun Zhou, John Synder, Baining Guo, and Heung-Yeung Shum. 2004. Iso-charts: stretch-driven mesh parameterization using spectral analysis. In SGP. 45--54.
[51]
Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, et al. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4 (2014), 156.

Cited By

View all
  • (2025)Reconstructing Complex Shaped Clothing From a Single Image With Feature Stable Unsigned Distance FieldsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.338193731:4(2142-2154)Online publication date: Apr-2025
  • (2024)Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric VideosACM Transactions on Graphics10.1145/368792643:6(1-15)Online publication date: 19-Dec-2024
  • (2024)VRMM: A Volumetric Relightable Morphable Head ModelACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657406(1-11)Online publication date: 13-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 36, Issue 6
December 2017
973 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3130800
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2017
Published in TOG Volume 36, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 4D reconstruction
  2. multi-view
  3. nonrigid
  4. real-time

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)9
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Reconstructing Complex Shaped Clothing From a Single Image With Feature Stable Unsigned Distance FieldsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.338193731:4(2142-2154)Online publication date: Apr-2025
  • (2024)Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric VideosACM Transactions on Graphics10.1145/368792643:6(1-15)Online publication date: 19-Dec-2024
  • (2024)VRMM: A Volumetric Relightable Morphable Head ModelACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657406(1-11)Online publication date: 13-Jul-2024
  • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
  • (2024)Modeling Realistic Clothing From a Single Image Under Normal GuideIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324558330:7(3995-4007)Online publication date: 1-Jul-2024
  • (2024)EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key PointsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336614846:8(5779-5790)Online publication date: 1-Aug-2024
  • (2024)GS-SFS: Joint Gaussian Splatting and Shape-From-Silhouette for Multiple Human Reconstruction in Large-Scale Sports ScenesIEEE Transactions on Multimedia10.1109/TMM.2024.344363726(11095-11110)Online publication date: 2024
  • (2024)3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01954(20675-20685)Online publication date: 16-Jun-2024
  • (2024)HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01866(19734-19745)Online publication date: 16-Jun-2024
  • (2024)I'M HOI: Inertia-Aware Monocular Capture of 3D Human-Object Interactions2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00076(729-741)Online publication date: 16-Jun-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media