ABSTRACT
Reconstructing a human portrait in a realistic and convenient manner is critical for human modeling and understanding. Aiming at light-weight and realistic human portrait reconstruction, in this paper we propose Neural3D: a novel neural human portrait scanning system using only a single RGB camera. In our system, to enable accurate pose estimation,we propose a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs. To enable realistic reconstruction and suppress the geometry error, we further adopt a point-based neural rendering scheme to generate realistic and immersive portrait visualization in arbitrary virtual view-points. By introducing these learning-based technical components into the pure RGB-based human modeling framework, we can achieve both accurate camera pose estimation and realistic free-viewpoint rendering of the reconstructed human portrait. Extensive experiments on a variety of challenging capture scenarios demonstrate the robustness and effectiveness of our approach.
Supplemental Material
- Kara-Ali Aliev, Dmitry Ulyanov, and Victor Lempitsky. 2019. Neural point-based graphics. arXiv preprint arXiv:1906.08240 (2019).Google Scholar
- Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018a. Detailed Human Avatars from Monocular Video. In 2018 International Conference on 3D Vision (3DV). 98--109.Google ScholarCross Ref
- Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018b. Video Based Reconstruction of 3D People Models. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Brett Allen, Brian Curless, and Zoran Popovi?. 2003. The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans. ACM Trans. Graph., Vol. 22, 3 (July 2003), 587--594. https://doi.org/10.1145/882262.882311Google ScholarDigital Library
- artec3d [n.d.]. artec3d. https://www.artec3d.com/. Accessed: 2020-05-24.Google Scholar
- Daniel Barath, Jiri Matas, and Jana Noskova. 2019. MAGSAC: marginalizing sample consensus. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10197--10205.Google ScholarCross Ref
- Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded Up Robust Features. In Computer Vision -- ECCV 2006,, Alevs Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 404--417.Google Scholar
- Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-Garment Net: Learning to Dress 3D People from Images. In IEEE International Conference on Computer Vision (ICCV). IEEE.Google ScholarCross Ref
- Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. 2017. DSAC - Differentiable RANSAC for Camera Localization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Eric Brachmann and Carsten Rother. 2019. Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses. arXiv preprint arXiv:1905.04132 (2019).Google Scholar
- Michael Calonder, Vincent Lepetit, Mustafa Ozuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2011. BRIEF: Computing a local binary descriptor very fast. IEEE transactions on pattern analysis and machine intelligence, Vol. 34, 7 (2011), 1281--1298.Google Scholar
- Wei Cheng, Lan Xu, Lei Han, Yuanfang Guo, and Lu Fang. 2018. IHuman3D: Intelligent Human Body 3D Reconstruction Using a Single Flying Camera. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1733--1741. https://doi.org/10.1145/3240508.3240600Google ScholarDigital Library
- Kong Man Cheung, Simon Baker, and Takeo Kanade. 2003. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. I--I.Google ScholarCross Ref
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 69.Google ScholarDigital Library
- Yan Cui and Didier Stricker. 2011. 3D Shape Scanning with a Kinect. In ACM SIGGRAPH 2011 Posters (Vancouver, British Columbia, Canada) (SIGGRAPH '11). Association for Computing Machinery, New York, NY, USA, Article 57, bibinfonumpages1 pages. https://doi.org/10.1145/2037715.2037780Google ScholarDigital Library
- Brian Curless and Marc Levoy. 1996. A Volumetric Method for Building Complex Models from Range Images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96). ACM, New York, NY, USA, 303--312. https://doi.org/10.1145/237170.237269Google ScholarDigital Library
- Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 224--236.Google ScholarCross Ref
- Martin A Fischler and Robert C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, Vol. 24, 6 (1981), 381--395.Google ScholarDigital Library
- Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. ARCH: Animatable Reconstruction of Clothed Humans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In Proceedings of the IEEE International Conference on Computer Vision. 3334--3342.Google ScholarDigital Library
- Hanbyul Joo, Tomas Simon, and Yaser Sheikh. 2018. Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In Computer Vision and Pattern Regognition (CVPR).Google Scholar
- Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Verica Lazova, Eldar Insafutdinov, and Gerard Pons-Moll. 2019. 360-Degree Textures of People in Clothing from a Single Image. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
- Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2019. Contextdesc: Local descriptor augmentation with cross-modality context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2527--2536.Google ScholarCross Ref
- Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, and Pascal Fua. 2018. Learning to find good correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2666--2674.Google ScholarCross Ref
- Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, and Shigeo Morishima. 2019. SiCloPe: Silhouette-Based Clothed People. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In Proc. of ISMAR. 127--136.Google ScholarDigital Library
- David Nistér. 2004. An efficient solution to the five-point relative pose problem. IEEE transactions on pattern analysis and machine intelligence, Vol. 26, 6 (2004), 0756--777.Google ScholarDigital Library
- PhotoScan [n.d.]. PhotoScan. http://www.agisoft.com/. Accessed: 2020-05-23.Google Scholar
- Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652--660.Google Scholar
- Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. Pointnet: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems. 5099--5108.Google Scholar
- Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan-Michael Frahm. 2012. USAC: a universal framework for random sample consensus. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2012), 2022--2038.Google Scholar
- Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104--4113.Google ScholarCross Ref
- Vincent Sitzmann, Michael Zollhoefer, and Gordon Wetzstein. 2019. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 1121--1132.Google Scholar
- Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred Neural Rendering: Image Synthesis Using Neural Textures. ACM Trans. Graph., Vol. 38, 4, Article 66 (July 2019), bibinfonumpages12 pages. https://doi.org/10.1145/3306346.3323035Google ScholarDigital Library
- Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2018. IGNOR: Image-guided neural object rendering. arXiv preprint arXiv:1811.10720 (2018).Google Scholar
- Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM, Vol. 59, 2 (2016), 64--73.Google ScholarDigital Library
- Jing Tong, Jin Zhou, Ligang Liu, Zhigeng Pan, and Hao Yan. 2012. Scanning 3D Full Human Bodies Using Kinects. IEEE Transactions on Visualization and Computer Graphics, Vol. 18, 4 (2012), 643--650.Google ScholarDigital Library
- Treedy's [n.d.]. Treedy's. https://www.treedys.com/. Accessed: 2019-07-25.Google Scholar
- Kangkan Wang, Guofeng Zhang, and Shihong Xia. 2017. Templateless Non-Rigid Reconstruction and Motion Tracking With a Single RGB-D Camera. IEEE Transactions on Image Processing, Vol. 26, 12 (Dec 2017), 5966--5979. https://doi.org/10.1109/TIP.2017.2740624Google ScholarDigital Library
- Jianxiong Xiao, Andrew Owens, and Antonio Torralba. 2013. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE International Conference on Computer Vision. 1625--1632.Google ScholarDigital Library
- Lan Xu, Wei Cheng, Kaiwen Guo, Lei Han, Yebin Liu, and Lu Fang. 2019 a. FlyFusion: Realtime Dynamic Scene Reconstruction Using a Flying Depth Camera. IEEE Transactions on Visualization and Computer Graphics (2019), 1--1.Google Scholar
- Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. 2017. FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras. IEEE Transactions on Visualization and Computer Graphics, Vol. PP, 99 (2017), 1--1. https://doi.org/10.1109/TVCG.2017.2728660Google Scholar
- Lan Xu, Zhuo Su, Lei Han, Tao Yu, Yebin Liu, and Lu Fang. 2019 b. UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using CommercialRGBD Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 1--1.Google Scholar
- Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019 c. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. Lift: Learned invariant feature transform. In European Conference on Computer Vision. Springer, 467--483.Google ScholarCross Ref
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision. 4471--4480.Google ScholarCross Ref
- Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, and Yebin Liu. 2018. DoubleFusion: Real-Time Capture of Human Performances With Inner Body Shapes From a Single Depth Sensor. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, and Hongen Liao. 2019. Learning Two-View Correspondences and Geometry Using Order-Aware Network. arXiv preprint arXiv:1908.04964 (2019).Google Scholar
Index Terms
- Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning
Recommendations
Deferred neural lighting: free-viewpoint relighting from unstructured photographs
We present deferred neural lighting, a novel method for free-viewpoint relighting from unstructured photographs of a scene captured with handheld devices. Our method leverages a scene-dependent neural rendering network for relighting a rough geometric ...
Geometry-Aware Single-Image Full-Body Human Relighting
Computer Vision – ECCV 2022AbstractSingle-image human relighting aims to relight a target human under new lighting conditions by decomposing the input image into albedo, shape and lighting. Although plausible relighting results can be achieved, previous methods suffer from both the ...
Neural Light Transport for Relighting and View Synthesis
The light transport (LT) of a scene describes how it appears under different lighting conditions from different viewing directions, and complete knowledge of a scene’s LT enables the synthesis of novel views under arbitrary lighting. In this article, we ...
Comments