research-article

DEGAN: Detail-Enhanced Generative Adversarial Network for Monocular Depth-Based 3D Reconstruction

Authors:

Xiaochuan WangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 12

Article No.: 365, Pages 1 - 17

https://doi.org/10.1145/3690826

Published: 18 November 2024 Publication History

Abstract

Although deep networks-based 3D reconstruction methods can recover the 3D geometry given few inputs, they may produce unfaithful reconstruction when predicting occluded parts of 3D objects. To address the issue, we propose Detail-Enhanced Generative Adversarial Network (DEGAN) which consists of Encoder–Decoder-Based Generator (EDGen) and Voxel-Point Embedding Network-Based Discriminator (VPDis) for 3D reconstruction from a monocular depth image of an object. Firstly, EDGen decodes the features from the 2.5D voxel grid representation of an input depth image and generates the 3D occupancy grid under GAN losses and a sampling point loss. The sampling loss can improve the accuracy of predicted points with high uncertainty. VPDis helps reconstruct the details under voxel and point adversarial losses, respectively. Experimental results show that DEGAN not only outperforms several state-of-the-art methods on both public ModelNet and ShapeNet datasets but also predicts more reliable occluded/missing parts of 3D objects.

References

[1]

Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, and Hongkai Wen. 2019. 3D object dense reconstruction from a single depth view. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2019), 2820–2834.

[2]

Tianhao Zhao, Songnan Li, King Ngi Ngan, and Fanzi Wu. 2019. 3-D reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multim. 21, 1 (2019), 114–123.

Digital Library

[3]

Hao Zhu, Yebin Liu, Jingtao Fan, Qionghai Dai, and Cao Xun. 2017. Video-based outdoor human reconstruction. IEEE Trans. Circ. Syst. Video Tech. 27, 4 (2017), 760–770.

Digital Library

[4]

Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In ICCV, Vol. 1, 1–8.

[5]

Yasutaka Furukawa and Jean Ponce. 2006. Carved visual hulls for image-based modeling. In ECCV, 564–577.

Digital Library

[6]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In CVPR, 1912–1920.

[7]

JunYoung Gwak, Christopher B. Choy, Manmohan Chandraker, Animesh Garg, and Silvio Savarese. 2017. Weakly supervised 3D reconstruction with adversarial constraint. In 3DV, 263–272.

[8]

Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman. 2016. Single image 3D interpreter network. In ECCV, 365–382.

[9]

Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew W. Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In ISMAR, 127–136.

Digital Library

[10]

Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32, 6 (2013), 169–169.

Digital Library

[11]

Frank Steinbrucker, Christian Kerl, and Daniel Cremers. 2013. Large-scale multi-resolution surface reconstruction from RGB-D sequences. In ICCV, 3264–3271.

Digital Library

[12]

Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV, 628–644.

[13]

Pengfei Wu, Yiguang Liu, Mao Ye, Jie Li, and Shuangli Du. 2017. Fast and adaptive 3D reconstruction with extensively high completeness. IEEE Trans. Multim. 19, 2 (2017), 266–278.

Digital Library

[14]

Gaochang Wu, Yebin Liu, Lu Fang, Qionghai Dai, and Tianyou Chai. 2018. Light field reconstruction using convolutional network on EPI and extended applications. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1681–1694.

[15]

Lingjing Wang and Yi Fang. 2017. Unsupervised 3D reconstruction from a single image via adversarial learning. arXiv:1711.09312.

[16]

Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, and Niki Trigoni. 2017. 3D object reconstruction from a single depth view with adversarial learning. ICCV Workshop 112, 518 (2017), 679–688.

[17]

Haoqiang Fan, Hao Su, and Leonidas J. Guibas Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In CVPR, 2463–2471.

[18]

Weisheng Dong, Guangming Shi, Xin Li, Kefan Peng, Jinjian Wu, and Zhenhua Guo. 2017. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Trans. Multim. 19, 2 (2017), 293–301.

Digital Library

[19]

Edward Smith and David Meger. 2017. Improved adversarial systems for 3D object generation and reconstruction. In CoRL, 87–96.

[20]

Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS, 82–90.

[21]

Haisheng Li, Yanping Zheng, Xiaoqun Wu, and Qiang Cai. 2019. 3D model generation and reconstruction using conditional generative adversarial network. Int. J. Comput. Intell. 12, 2 (2019), 697–705.

[22]

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. 2022. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 16102–16112.

[23]

Jhony K. Pontes, Chen Kong, Sridha Sridharan, Simon Lucey, Anders Eriksson, and Clinton Fookes. 2018. Image2Mesh: A learning framework for single image 3D reconstruction. In ACCV, 365–381.

[24]

Yangyan Li, Angela Dai, Leonidas Guibas, and Matthias Nießner. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Comput. Graph. Forum 34, 2 (2015), 435–446.

Digital Library

[25]

Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In ECCV, 484–499.

[26]

Abhishek Sharma, Oliver Grau, and Mario Fritz. 2016. VCONV-DAE: Deep volumetric shape learning without object labels. In ECCV, 236–250.

[27]

Chen-Hsuan Lin, Chen Kong, and Simon Lucey. 2018. Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI, 7114–7121.

[28]

Jacob Varley, Chad DeChant, Adam Richardson, Joaquín Ruales, and Peter K. Allen. 2017. Shape completion enabled robotic grasping. In IROS, 2442–2447.

Digital Library

[29]

Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, and Ziwei Liu. 2023. Variational relational point completion network for robust 3D classification. IEEE Trans. Pattern Anal. Mach. Intell. 45, 9 (2023), 11340–11351.

Digital Library

[30]

Charlie Nash and Chris K. I. Williams. 2017. The shape variational autoencoder: A deep generative model of part-segmented 3D objects. Comput. Graph. Forum 36, 5 (2017), 1–12.

Digital Library

[31]

Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, and Nicolas Heess. 2016. Unsupervised learning of 3D structure from images. In NIPS, 4996–5004.

[32]

Wen Xin, Zhou Junsheng, Liu YuShen, Su Hua, Dong Zhen, and Han Zhizhong. 2022. 3D shape reconstruction from 2D images with disentangled attribute flow. In CVPR, 3793–3803.

[33]

Aihua Mao, Zihui Du, Junhui Hou, Yaqi Duan, Yong-Jin Liu, and Ying He. 2023. PU-Flow: A point cloud upsampling network with normalizing flows. IEEE Trans. Vis. Comput. Graph 29, 12 (2023), 4964–4977.

Digital Library

[34]

Xiaoxiao Long, YuanChen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, SongHai Zhang, Marc Habermann, Christian Theobalt, and Wenping Wang. 2023. Wonder3D: Single image to 3D using cross-domain diffusion. arXiv:2310.15008.

[35]

Titas Anciukevicius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, and Paul Guerrero. 2023. RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation. In CVPR, 1–15.

[36]

Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. 2023. Single-stage diffusion NeRF: A unified approach to 3D generation and reconstruction. In ICCV, 1–19.

[37]

Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, and Josh M. Susskind. 2023. Learning controllable 3D diffusion models from single-view images. arXiv:2304.06700.

[38]

Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling view-conditioned diffusion for 3D reconstruction. In CVPR, 12588–12597.

[39]

Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2021. Latent feature-aware and local structure-preserving network for 3D completion from a single depth view. In ICANN, 67–79.

Digital Library

[40]

Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2021. DLGAN: Depth-preserving latent generative adversarial network for 3D reconstruction. IEEE Trans. Multim. 23 (2021), 2843–2856.

Digital Library

[41]

Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2022. A spatial relationship preserving adversarial network for 3D reconstruction from a single depth view. ACM Trans. Multim. Comput. Commun. Appl. 18, 4 (2022), 1–22.

Digital Library

[42]

Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, and Nick Barnes. 2023. P2C: Self-supervised point cloud completion from single partial clouds. In ICCV, 1–10.

[43]

Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Learning category-specific mesh reconstruction from image collections. In ECCV, Vol. 15, 386–402.

Digital Library

[44]

Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J. Black. 2018. Generating 3D faces using convolutional mesh autoencoders. In ECCV, Vol. 3, 725–741.

Digital Library

[45]

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV, Vol. 11, 55–71.

Digital Library

[46]

Caixia Liu, Dehui Kong, Shaofan Wang, Qianxing Li, Jinghua Li, and Baocai Yin. 2023. Multi-scale latent feature-aware network for logical partition based 3D voxel reconstruction. Neurocomputing 533 (2023), 22–34.

Digital Library

[47]

Angela Dai, Charles Ruizhongtai Qi, and Matthias Nießner. 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In CVPR, 6545–6554.

[48]

Jeong Joon Park, Peter Florence, Julian Straub, Richard A. Newcombe, and Steven Lovegrove. 2019. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR, 165–174.

[49]

Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In CVPR, 5939–5948.

[50]

Lars M. Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy networks: Learning 3D reconstruction in function space. In CVPR, 4460–4470.

[51]

Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomír Mech, and Ulrich Neumann. 2019. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NIPS, 490–500.

[52]

Meihua Zhao, Gang Xiong, MengChu Zhou, Zhen Shen, and FeiYue Wang. 2021. 3D-RVP: A method for 3D object reconstruction from a single depth view using voxel and point. Neurocomputing 430 (2021), 94–103.

[53]

Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, and Achuta Kadambi. 2023. ALTO: Alternating latent topologies for implicit 3D reconstruction. In CVPR, 94–103.

[54]

Zekun Qi, Muzhou Yu, Runpei Dong, and Kaisheng Ma. 2023. VPP: Efficient conditional 3D generation via voxel-point progressive representation. In NIPS, 26744–26763.

[55]

Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, and Martial Hebert. 2018. PCN: Point completion network. In 3DV, 728–737.

[56]

Pablo Speciale, Martin R. Oswald, Andrea Cohen, and Marc Pollefeys. 2016. A symmetry prior for convex variational 3D reconstruction. In ECCV, Vol. 8, 313–328.

[57]

Hassan Afzal, Djamila Aouada, Bruno Mirbach, and Björn E. Ottersten. 2018. Full 3D reconstruction of non-rigidly deforming objects. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), 1–23.

Digital Library

[58]

Bo Yang, Sen Wang, Andrew Markham, and Niki Trigoni. 2020. Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. In IJCV, 53–73.

Digital Library

[59]

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot one image to 3D object. In ICCV, 9264–9275.

[60]

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. LRM: Large reconstruction model for single image to 3D. In ICLR, 1–25.

[61]

Yao Wei, George Vosselman, and Michael Ying Yang. 2023. BuilDiff: 3D building shape generation using single-image conditional point cloud diffusion models. In ICCV Workshops, 2902–2911.

[62]

Luke Melas-Kyriazi, Christian Rupprecht, and Andrea Vedaldi. 2023. \(\rm PC^{2}\): Projection-conditioned point cloud diffusion for single-image 3D reconstruction. In CVPR, 12923–12932.

[63]

Yue Jiang, Dantong Ji, Zhizhong Han, and Matthias Zwicker. 2020. SDFDiff: Differentiable rendering of signed distance fields for 3D shape optimization. In CVPR, 1248–1258.

[64]

Yanping Zheng, Guang Zeng, Haisheng Li, Qiang Cai, and Junping Du. 2022. Colorful 3D reconstruction at high resolution using multi-view representation. J. Vis. Commun. Image Represent. 85, 103486 (2022), 1–10.

Digital Library

[65]

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In CVPR, 7122–7131.

[66]

Chen Kong, Chenhsuan Lin, and Simon Lucey. 2017. Using locally corresponding CAD models for dense 3D reconstructions from a single image. In CVPR, 5603–5611.

[67]

Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu Aubry. 2018. A papier-mâché approach to learning 3D surface generation. In CVPR, 216–224.

[68]

Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, and Shengping Zhang. 2019. Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV, 2690–2698.

[69]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 234–241.

[70]

Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. 2014. Beyond PASCAL: A benchmark for 3D object detection in the wild. In WACV, 75–82.

[71]

Joseph J. Lim, Hamed Pirsiavash, and Antonio Torralba. 2013. Parsing IKEA objects: Fine pose estimation. In ICCV, 2992–2999.

Digital Library

[72]

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B. McHugh, and Vincent Vanhoucke. 2022. Google scanned objects: A high-quality dataset of 3D scanned household items. In ICRA, 2553–2560.

Digital Library

[73]

Pratheba Selvaraju, Mohamed Nabail, Marios Loizou, Maria Maslioukova, Melinos Averkiou, Andreas Andreou, Siddhartha Chaudhuri, and Evangelos Kalogerakis. 2021. BuildingNet: Learning to label 3D buildings. In ICCV, 10377–10387.

[74]

Ruxandra Marina Florea, Adrian Munteanu, Shao Ping Lu, and Peter Schelkens. 2017. Wavelet-based \(L_{\infty}\) semi-regular mesh coding. IEEE Trans. Multim. 19, 2 (2017), 236–250.

Digital Library

[75]

Ruizhen Hu, Tingkai Sha, Oliver van Kaick, Oliver Deussen, and Hui Huang. (2020). Data sampling in multi-view and multi-class scatterplots via set cover optimization. IEEE Trans. Vis. Comput. Graph 26, 1 (2020), 739–748.

[76]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR, 1–15.

[77]

Lyne P. Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian D. Reid, and Silvio Savarese. 2019. TopNet: Structural point cloud decoder. In CVPR, 383–392.

[78]

Xiaogang Wang, Marcelo H. Ang, and Gim Hee Lee. 2020. Cascaded refinement network for point cloud completion. In CVPR, 787–796.

Index Terms

DEGAN: Detail-Enhanced Generative Adversarial Network for Monocular Depth-Based 3D Reconstruction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Depth map prediction from a single image with generative adversarial nets
Abstract
A depth map is a fundamental component of 3D construction. Depth map prediction from a single image is a challenging task in computer vision. In this paper, we consider the depth prediction as an image-to-image task and propose an adversarial ...
Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Monocular simultaneous localization and mapping (SLAM) is a key enabling technique for many computer vision and robotics applications. However, existing methods either can obtain only sparse or semi-dense maps in highly-textured image areas or fail to ...
3D shape reconstruction from multifocus image fusion using a multidirectional modified Laplacian operator
Highlights
- The feasibility of combining multifocus image fusion and shape-from-focus methods to obtain real 3D reconstruction results is introduced.
Abstract
Multifocus image fusion techniques primarily emphasize human vision and machine perception to evaluate an image, which often ignore depth information contained in the focus regions. In this paper, a novel 3D shape reconstruction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 12

December 2024

721 pages

EISSN:1551-6865

DOI:10.1145/3618076

Editor:
Abuabdulmotaleb El Saddik
University of Ottowa

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2024

Online AM: 30 August 2024

Accepted: 15 August 2024

Revised: 16 April 2024

Received: 18 December 2023

Published in TOMM Volume 20, Issue 12

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Beijing Natural Science Foundation
R&D Program of Beijing Municipal Education Commission

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
299
Total Downloads

Downloads (Last 12 months)299
Downloads (Last 6 weeks)19

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents