skip to main content
10.1145/3394171.3413734acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning

Published:12 October 2020Publication History

ABSTRACT

Reconstructing a human portrait in a realistic and convenient manner is critical for human modeling and understanding. Aiming at light-weight and realistic human portrait reconstruction, in this paper we propose Neural3D: a novel neural human portrait scanning system using only a single RGB camera. In our system, to enable accurate pose estimation,we propose a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs. To enable realistic reconstruction and suppress the geometry error, we further adopt a point-based neural rendering scheme to generate realistic and immersive portrait visualization in arbitrary virtual view-points. By introducing these learning-based technical components into the pure RGB-based human modeling framework, we can achieve both accurate camera pose estimation and realistic free-viewpoint rendering of the reconstructed human portrait. Extensive experiments on a variety of challenging capture scenarios demonstrate the robustness and effectiveness of our approach.

Skip Supplemental Material Section

Supplemental Material

3394171.3413734.mp4

mp4

51.3 MB

References

  1. Kara-Ali Aliev, Dmitry Ulyanov, and Victor Lempitsky. 2019. Neural point-based graphics. arXiv preprint arXiv:1906.08240 (2019).Google ScholarGoogle Scholar
  2. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018a. Detailed Human Avatars from Monocular Video. In 2018 International Conference on 3D Vision (3DV). 98--109.Google ScholarGoogle ScholarCross RefCross Ref
  3. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018b. Video Based Reconstruction of 3D People Models. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  4. Brett Allen, Brian Curless, and Zoran Popovi?. 2003. The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans. ACM Trans. Graph., Vol. 22, 3 (July 2003), 587--594. https://doi.org/10.1145/882262.882311Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. artec3d [n.d.]. artec3d. https://www.artec3d.com/. Accessed: 2020-05-24.Google ScholarGoogle Scholar
  6. Daniel Barath, Jiri Matas, and Jana Noskova. 2019. MAGSAC: marginalizing sample consensus. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10197--10205.Google ScholarGoogle ScholarCross RefCross Ref
  7. Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded Up Robust Features. In Computer Vision -- ECCV 2006,, Alevs Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 404--417.Google ScholarGoogle Scholar
  8. Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-Garment Net: Learning to Dress 3D People from Images. In IEEE International Conference on Computer Vision (ICCV). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. 2017. DSAC - Differentiable RANSAC for Camera Localization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  10. Eric Brachmann and Carsten Rother. 2019. Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses. arXiv preprint arXiv:1905.04132 (2019).Google ScholarGoogle Scholar
  11. Michael Calonder, Vincent Lepetit, Mustafa Ozuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2011. BRIEF: Computing a local binary descriptor very fast. IEEE transactions on pattern analysis and machine intelligence, Vol. 34, 7 (2011), 1281--1298.Google ScholarGoogle Scholar
  12. Wei Cheng, Lan Xu, Lei Han, Yuanfang Guo, and Lu Fang. 2018. IHuman3D: Intelligent Human Body 3D Reconstruction Using a Single Flying Camera. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1733--1741. https://doi.org/10.1145/3240508.3240600Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kong Man Cheung, Simon Baker, and Takeo Kanade. 2003. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. I--I.Google ScholarGoogle ScholarCross RefCross Ref
  14. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yan Cui and Didier Stricker. 2011. 3D Shape Scanning with a Kinect. In ACM SIGGRAPH 2011 Posters (Vancouver, British Columbia, Canada) (SIGGRAPH '11). Association for Computing Machinery, New York, NY, USA, Article 57, bibinfonumpages1 pages. https://doi.org/10.1145/2037715.2037780Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Brian Curless and Marc Levoy. 1996. A Volumetric Method for Building Complex Models from Range Images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96). ACM, New York, NY, USA, 303--312. https://doi.org/10.1145/237170.237269Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 224--236.Google ScholarGoogle ScholarCross RefCross Ref
  18. Martin A Fischler and Robert C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, Vol. 24, 6 (1981), 381--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  20. Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. ARCH: Animatable Reconstruction of Clothed Humans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  21. Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In Proceedings of the IEEE International Conference on Computer Vision. 3334--3342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hanbyul Joo, Tomas Simon, and Yaser Sheikh. 2018. Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  23. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In Computer Vision and Pattern Regognition (CVPR).Google ScholarGoogle Scholar
  24. Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  25. Verica Lazova, Eldar Insafutdinov, and Gerard Pons-Moll. 2019. 360-Degree Textures of People in Clothing from a Single Image. In International Conference on 3D Vision (3DV).Google ScholarGoogle ScholarCross RefCross Ref
  26. Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2019. Contextdesc: Local descriptor augmentation with cross-modality context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2527--2536.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, and Pascal Fua. 2018. Learning to find good correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2666--2674.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, and Shigeo Morishima. 2019. SiCloPe: Silhouette-Based Clothed People. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  29. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  30. Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In Proc. of ISMAR. 127--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Nistér. 2004. An efficient solution to the five-point relative pose problem. IEEE transactions on pattern analysis and machine intelligence, Vol. 26, 6 (2004), 0756--777.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. PhotoScan [n.d.]. PhotoScan. http://www.agisoft.com/. Accessed: 2020-05-23.Google ScholarGoogle Scholar
  33. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652--660.Google ScholarGoogle Scholar
  34. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. Pointnet: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems. 5099--5108.Google ScholarGoogle Scholar
  35. Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan-Michael Frahm. 2012. USAC: a universal framework for random sample consensus. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2012), 2022--2038.Google ScholarGoogle Scholar
  36. Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  37. Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  38. Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104--4113.Google ScholarGoogle ScholarCross RefCross Ref
  39. Vincent Sitzmann, Michael Zollhoefer, and Gordon Wetzstein. 2019. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 1121--1132.Google ScholarGoogle Scholar
  40. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred Neural Rendering: Image Synthesis Using Neural Textures. ACM Trans. Graph., Vol. 38, 4, Article 66 (July 2019), bibinfonumpages12 pages. https://doi.org/10.1145/3306346.3323035Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2018. IGNOR: Image-guided neural object rendering. arXiv preprint arXiv:1811.10720 (2018).Google ScholarGoogle Scholar
  42. Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM, Vol. 59, 2 (2016), 64--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jing Tong, Jin Zhou, Ligang Liu, Zhigeng Pan, and Hao Yan. 2012. Scanning 3D Full Human Bodies Using Kinects. IEEE Transactions on Visualization and Computer Graphics, Vol. 18, 4 (2012), 643--650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Treedy's [n.d.]. Treedy's. https://www.treedys.com/. Accessed: 2019-07-25.Google ScholarGoogle Scholar
  45. Kangkan Wang, Guofeng Zhang, and Shihong Xia. 2017. Templateless Non-Rigid Reconstruction and Motion Tracking With a Single RGB-D Camera. IEEE Transactions on Image Processing, Vol. 26, 12 (Dec 2017), 5966--5979. https://doi.org/10.1109/TIP.2017.2740624Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jianxiong Xiao, Andrew Owens, and Antonio Torralba. 2013. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE International Conference on Computer Vision. 1625--1632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lan Xu, Wei Cheng, Kaiwen Guo, Lei Han, Yebin Liu, and Lu Fang. 2019 a. FlyFusion: Realtime Dynamic Scene Reconstruction Using a Flying Depth Camera. IEEE Transactions on Visualization and Computer Graphics (2019), 1--1.Google ScholarGoogle Scholar
  48. Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. 2017. FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras. IEEE Transactions on Visualization and Computer Graphics, Vol. PP, 99 (2017), 1--1. https://doi.org/10.1109/TVCG.2017.2728660Google ScholarGoogle Scholar
  49. Lan Xu, Zhuo Su, Lei Han, Tao Yu, Yebin Liu, and Lu Fang. 2019 b. UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using CommercialRGBD Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 1--1.Google ScholarGoogle Scholar
  50. Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019 c. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  51. Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. Lift: Learned invariant feature transform. In European Conference on Computer Vision. Springer, 467--483.Google ScholarGoogle ScholarCross RefCross Ref
  52. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision. 4471--4480.Google ScholarGoogle ScholarCross RefCross Ref
  53. Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, and Yebin Liu. 2018. DoubleFusion: Real-Time Capture of Human Performances With Inner Body Shapes From a Single Depth Sensor. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  54. Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, and Hongen Liao. 2019. Learning Two-View Correspondences and Geometry Using Order-Aware Network. arXiv preprint arXiv:1908.04964 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader