skip to main content
research-article

Online Scene CAD Recomposition via Autonomous Scanning

Published: 05 December 2023 Publication History

Abstract

Autonomous surface reconstruction of 3D scenes has been intensely studied in recent years, however, it is still difficult to accurately reconstruct all the surface details of complex scenes with complicated object relations and severe occlusions, which makes the reconstruction results not suitable for direct use in applications such as gaming and virtual reality. Therefore, instead of reconstructing the detailed surfaces, we aim to recompose the scene with CAD models retrieved from a given dataset to faithfully reflect the object geometry and arrangement in the given scene. Moreover, unlike most of the previous works on scene CAD recomposition requiring an offline reconstructed scene or captured video as input, which leads to significant data redundancy, we propose a novel online scene CAD recomposition method with autonomous scanning, which efficiently recomposes the scene with the guidance of automatically optimized Next-Best-View (NBV) in a single online scanning pass. Based on the key observation that spatial relation in the scene can not only constrain the object pose and layout optimization but also guide the NBV generation, our system consists of two key modules: relation-guided CAD recomposition module that uses relation-constrained global optimization to get accurate object pose and layout estimation, and relation-aware NBV generation module that makes the exploration during the autonomous scanning tailored for our composition task. Extensive experiments have been conducted to show the superiority of our method over previous methods in scanning efficiency and retrieval accuracy as well as the importance of each key component of our method.

Supplemental Material

MP4 File
supplemental
ZIP File
supplemental

References

[1]
Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, and Matthias Nießner. 2018a. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. CoRR abs/1811.11187 (2018). arXiv:1811.11187 http://arxiv.org/abs/1811.11187
[2]
Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, and Matthias Nießner. 2018b. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. CoRR abs/1811.11187 (2018). arXiv:1811.11187 http://arxiv.org/abs/1811.11187
[3]
Armen Avetisyan, Angela Dai, and Matthias Nießner. 2019. End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans. CoRR abs/1906.04201 (2019). arXiv:1906.04201 http://arxiv.org/abs/1906.04201
[4]
Armen Avetisyan, Tatiana Khanova, Christopher B. Choy, Denver Dash, Angela Dai, and Matthias Nießner. 2020. SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans. CoRR abs/2003.12622 (2020). arXiv:2003.12622 https://arxiv.org/abs/2003.12622
[5]
P.J. Besl and Neil D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992), 239--256.
[6]
John Canny. 1986. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8, 6 (1986), 679--698.
[7]
Angel X. Chang, Thomas A. Funkhouser, Leonidas J. Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. CoRR abs/1512.03012 (2015). arXiv:1512.03012 http://arxiv.org/abs/1512.03012
[8]
Benjamin Charrow, Gregory Kahn, Sachin Patil, Sikang Liu, Ken Goldberg, Pieter Abbeel, Nathan Michael, and Vijay Kumar. 2015. Information-Theoretic Planning with Trajectory Optimization for Dense 3D Mapping. In Robotics: Science and Systems, Vol. 11. Rome, 3--12.
[9]
T. Cover and P. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (1967), 21--27.
[10]
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas A. Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CoRR abs/1702.04405 (2017). arXiv:1702.04405 http://arxiv.org/abs/1702.04405
[11]
Zhuo Deng and Longin Jan Latecki. 2017. Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 398--406.
[12]
David H. Douglas and Thomas K. Peucker. 1973. ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE. Cartographica: The International Journal for Geographic Information and Geovisualization 10 (1973), 112--122.
[13]
Margarita Grinvald, Fadri Furrer, Tonci Novkovic, Jen Jen Chung, Cesar Cadena, Roland Siegwart, and Juan I. Nieto. 2019. Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery. CoRR abs/1903.00268 (2019). arXiv:1903.00268 http://arxiv.org/abs/1903.00268
[14]
Can Gümeli, Angela Dai, and Matthias Nießner. 2021. ROCA: Robust CAD Model Retrieval and Alignment from a Single Image. CoRR abs/2112.01988 (2021). arXiv:2112.01988 https://arxiv.org/abs/2112.01988
[15]
Junfu Guo, Changhao Li, Xi Xia, Ruizhen Hu, and Ligang Liu. 2022. Asynchronous Collaborative Autoscanning with Mode Switching for Multi-Robot Scene Reconstruction. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1--13.
[16]
Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, and Hangxin Liu. 2021. Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments. CoRR abs/2103.16095 (2021). arXiv:2103.16095 https://arxiv.org/abs/2103.16095
[17]
Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In 2014 IEEE International Conference on Robotics and Automation (ICRA). 1524--1531.
[18]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. CoRR abs/1703.06870 (2017). arXiv:1703.06870 http://arxiv.org/abs/1703.06870
[19]
Lionel Heng, Alkis Gotovos, Andreas Krause, and Marc Pollefeys. 2015. Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1071--1078.
[20]
Pengdi Huang, Liqiang Lin, Kai Xu, and Hui Huang. 2020. Autonomous Outdoor Scanning via Online Topological and Geometric Path Optimization. IEEE Transactions on Intelligent Transportation Systems (2020).
[21]
Vladislav Ishimtsev, Alexey Bokhovkin, Alexey Artemov, Savva Ignatyev, Matthias Nießner, Denis Zorin, and Evgeny Burnaev. 2020. CAD-Deform: Deformable Fitting of CAD Models to 3D Scans. CoRR abs/2007.11965 (2020). arXiv:2007.11965 https://arxiv.org/abs/2007.11965
[22]
Hamid Izadinia and Steven M. Seitz. 2018. Scene Recomposition by Learning-based ICP. CoRR abs/1812.05583 (2018). arXiv:1812.05583 http://arxiv.org/abs/1812.05583
[23]
Young Min Kim, Niloy J Mitra, Qixing Huang, and Leonidas Guibas. 2013. Guided real-time scanning of indoor objects. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 177--186.
[24]
Michael Krainin, Brian Curless, and Dieter Fox. 2011. Autonomous generation of complete 3D object models using next best view manipulation planning. In 2011 IEEE International Conference on Robotics and Automation. IEEE, 5031--5037.
[25]
Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, and Angela Dai. 2020. Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve. CoRR abs/2007.13034 (2020). arXiv:2007.13034 https://arxiv.org/abs/2007.13034
[26]
Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, and Angela Dai. 2021. Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image. CoRR abs/2108.09368 (2021). arXiv:2108.09368 https://arxiv.org/abs/2108.09368
[27]
Yangyan Li, Angela Dai, Leonidas Guibas, and Matthias Nießner. 2015. Database-assisted object retrieval for real-time 3d reconstruction. In Computer graphics forum, Vol. 34. Wiley Online Library, 435--446.
[28]
Ligang Liu, Xi Xia, Han Sun, Qi Shen, Juzhan Xu, Bin Chen, Hui Huang, and Kai Xu. 2018. Object-aware guidance for autonomous scene reconstruction. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--12.
[29]
Yilin Liu, Ruiqi Cui, Ke Xie, Minglun Gong, and Hui Huang. 2021. Aerial Path Planning for Online Real-Time Exploration and Offline High-Quality Reconstruction of Large-Scale Urban Scenes. ACM Transactions on Graphics (Proceedings of SIGGRAPH ASIA) 40, 6 (2021), 226:1--226:16.
[30]
Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nießner, and Vittorio Ferrari. 2020. Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos. CoRR abs/2012.04641 (2020). arXiv:2012.04641 https://arxiv.org/abs/2012.04641
[31]
Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2016. Point-Net: Deep Learning on Point Sets for 3D Classification and Segmentation. CoRR abs/1612.00593 (2016). arXiv:1612.00593 http://arxiv.org/abs/1612.00593
[32]
Lu Qi, Li Jiang, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Amodal Instance Segmentation With KINS Dataset. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3009--3018.
[33]
Manikandasriram Srinivasan Ramanagopal and Jerome Le Ny. 2016. Motion planning strategies for autonomously mapping 3d structures. arXiv preprint arXiv:1602.06667 (2016).
[34]
Mike Roberts, Debadeepta Dey, Anh Truong, Sudipta Sinha, Shital Shah, Ashish Kapoor, Pat Hanrahan, and Neel Joshi. 2017. Submodular trajectory optimization for aerial 3d scanning. In Proceedings of the IEEE International Conference on Computer Vision. 5324--5333.
[35]
Renato F Salas-Moreno, Richard A Newcombe, Hauke Strasdat, Paul HJ Kelly, and Andrew J Davison. 2013. Slam++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1352--1359.
[36]
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. CoRR abs/1904.01201 (2019). arXiv:1904.01201 http://arxiv.org/abs/1904.01201
[37]
Lukas Schmid, Michael Pantic, Raghav Khanna, Lionel Ott, Roland Siegwart, and Juan Nieto. 2020. An efficient sampling-based method for online informative path planning in unknown environments. IEEE Robotics and Automation Letters 5, 2 (2020), 1500--1507.
[38]
Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera. ACM Trans. Graph. 31, 6, Article 136 (Nov. 2012), 11 pages.
[39]
Mikaela Angelina Uy, Vladimir G. Kim, Minhyuk Sung, Noam Aigerman, Siddhartha Chaudhuri, and Leonidas J. Guibas. 2021. Joint Learning of 3D Shape Retrieval and Deformation. CoRR abs/2101.07889 (2021). arXiv:2101.07889 https://arxiv.org/abs/2101.07889
[40]
J Irving Vasquez-Gomez, L Enrique Sucar, Rafael Murrieta-Cid, and Efrain Lopez-Damian. 2014. Volumetric next-best-view planning for 3D object reconstruction with positioning error. International Journal of Advanced Robotic Systems 11, 10 (2014), 159.
[41]
Chu Wang, Babak Samari, Vladimir G Kim, Siddhartha Chaudhuri, and Kaleem Siddiqi. 2020. Affinity graph supervision for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8247--8255.
[42]
Melonee Wise, Michael Ferguson, Derek King, Eric Diehr, and David Dymesich. 2016. Fetch and freight: Standard platforms for service robot applications. In Workshop on autonomous mobile service robots. 1--6.
[43]
Shihao Wu, Wei Sun, Pinxin Long, Hui Huang, Daniel Cohen-Or, Minglun Gong, Oliver Deussen, and Baoquan Chen. 2014. Quality-driven poisson-guided autoscanning. ACM Transactions on Graphics 33, 6 (2014).
[44]
Kai Xu, Hui Huang, Yifei Shi, Hao Li, Pinxin Long, Jianong Caichen, Wei Sun, and Baoquan Chen. 2015. Autoscanning for coupled scene reconstruction and proactive object analysis. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1--14.
[45]
Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or, and Baoquan Chen. 2016. 3D attention-driven depth acquisition for object identification. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1--14.
[46]
Kai Xu, Lintao Zheng, Zihao Yan, Guohang Yan, Eugene Zhang, Matthias Niessner, Oliver Deussen, Daniel Cohen-Or, and Hui Huang. 2017. Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1--15.
[47]
Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. 2017. FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds. CoRR abs/1712.07262 (2017). arXiv:1712.07262 http://arxiv.org/abs/1712.07262
[48]
Hong-Xing Yu, Jiajun Wu, and Li Yi. 2022. Rotationally Equivariant 3D Object Detection.
[49]
Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, and Jie Zhou. 2021. PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers. CoRR abs/2108.08839 (2021). arXiv:2108.08839 https://arxiv.org/abs/2108.08839
[50]
Cheng Zhang, Zhaopeng Cui, Yinda Zhang, Bing Zeng, Marc Pollefeys, and Shuaicheng Liu. 2021a. Holistic 3D Scene Understanding from a Single Image with Implicit Representation. CoRR abs/2103.06422 (2021). arXiv:2103.06422 https://arxiv.org/abs/2103.06422
[51]
Yi-Fan Zhang, Weiqiang Ren, Zhang Zhang, Zhen Jia, Liang Wang, and Tieniu Tan. 2021b. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. CoRR abs/2101.08158 (2021). arXiv:2101.08158 https://arxiv.org/abs/2101.08158
[52]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. 2020. Point Transformer. CoRR abs/2012.09164 (2020). arXiv:2012.09164 https://arxiv.org/abs/2012.09164
[53]
Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2018. On the Continuity of Rotation Representations in Neural Networks. CoRR abs/1812.07035 (2018). arXiv:1812.07035 http://arxiv.org/abs/1812.07035

Cited By

View all
  • (2025)A Simple Data Augmentation for Graph Classification: A Perspective of Equivariance and InvarianceACM Transactions on Knowledge Discovery from Data10.1145/370606219:2(1-24)Online publication date: 14-Feb-2025
  • (2024)Disentangled continual graph neural architecture search with invariant modular supernetProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694549(59975-59991)Online publication date: 21-Jul-2024
  • (2024)Disentangled graph self-supervised learning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693230(28890-28904)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 42, Issue 6
December 2023
1565 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3632123
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023
Published in TOG Volume 42, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autonomous scanning
  2. relation-constrained retrieval
  3. relation-guided pose optimization
  4. scene CAD recomposition

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Simple Data Augmentation for Graph Classification: A Perspective of Equivariance and InvarianceACM Transactions on Knowledge Discovery from Data10.1145/370606219:2(1-24)Online publication date: 14-Feb-2025
  • (2024)Disentangled continual graph neural architecture search with invariant modular supernetProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694549(59975-59991)Online publication date: 21-Jul-2024
  • (2024)Disentangled graph self-supervised learning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693230(28890-28904)Online publication date: 21-Jul-2024
  • (2024)Particle-Laden Fluid on Flow MapsACM Transactions on Graphics10.1145/368791643:6(1-12)Online publication date: 19-Nov-2024
  • (2024)Kinetic Simulation of Turbulent Multifluid FlowsACM Transactions on Graphics10.1145/365817843:4(1-17)Online publication date: 19-Jul-2024
  • (2024)ZeroGrads: Learning Local Surrogates for Non-Differentiable GraphicsACM Transactions on Graphics10.1145/365817343:4(1-15)Online publication date: 19-Jul-2024
  • (2024)Spatial and Surface Correspondence Field for Interaction TransferACM Transactions on Graphics10.1145/365816943:4(1-12)Online publication date: 19-Jul-2024
  • (2024)Enhancing Out-of-distribution Generalization on Graphs via Causal Attention LearningACM Transactions on Knowledge Discovery from Data10.1145/364439218:5(1-24)Online publication date: 26-Mar-2024
  • (2024)MaPa: Text-driven Photorealistic Material Painting for 3D ShapesACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657504(1-12)Online publication date: 13-Jul-2024
  • (2024)Iterative Motion Editing with Natural LanguageACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657447(1-9)Online publication date: 13-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media