Abstract
Shape retrieval and alignment are a promising avenue towards turning 3D scans into lightweight CAD representations that can be used for content creation such as mobile or AR/VR gaming scenarios. Unfortunately, CAD model retrieval is limited by the availability of models in standard 3D shape collections (e.g., ShapeNet). In this work, we address this shortcoming by introducing CAD-Deform (The code for the project: https://github.com/alexeybokhovkin/CAD-Deform), a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. Our key contribution is a new non-rigid deformation model incorporating smooth transformations and preservation of sharp features, that simultaneously achieves very tight fits from CAD models to the 3D scan and maintains the clean, high-quality surface properties of hand-modeled CAD objects. A series of thorough experiments demonstrate that our method achieves significantly tighter scan-to-CAD fits, allowing a more accurate digital replica of the scanned real-world environment while preserving important geometric features present in synthetic CAD environments.
V. Ishimtsev and A. Bokhovkin—equal contribution.
A. Artemov—Technical lead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achenbach, J., Zell, E., Botsch, M.: Accurate face reconstruction through anisotropic fitting and eye correction. In: VMV, pp. 1–8 (2015)
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. In: International Conference on Machine Learning, pp. 40–49 (2018)
Amberg, B., Romdhani, S., Vetter, T.: Optimal step nonrigid ICP algorithms for surface registration. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of cad models. In: CVPR (2014)
Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Nießner, M.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2614–2623 (2019)
Avetisyan, A., Dai, A., Nießner, M.: End-to-end CAD model retrieval and 9DoF alignment in 3D scans. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2551–2560 (2019)
Botsch, M., Kobbelt, L.: An intuitive framework for real-time freeform modeling. ACM Trans. Graph. (TOG) 23(3), 630–634 (2004)
Cagniart, C., Boyer, E., Ilic, S.: Iterative mesh deformation for dense surface tracking. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1465–1472. IEEE (2009)
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 303–312. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237170.237269
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Dai, A., Nießner, M., Zollöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. In: ACM Transactions on Graphics 2017 (TOG) (2017)
Dai, A., Ruizhongtai Qi, C., Nießner, M.: Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5868–5877 (2017)
Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Dey, T.K., Fu, B., Wang, H., Wang, L.: Automatic posing of a meshed human model using point clouds. Comput. Graph. 46, 14–24 (2015)
Drost, B., Ilic, S.: 3D object detection and localization using multimodal point pair features. In: 3DIMPVT, pp. 9–16. IEEE Computer Society (2012)
Egiazarian, V., et al.: Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds, December 2019
Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5431–5440 (2016)
Fröhlich, S., Botsch, M.: Example-driven deformations based on discrete shells. Comput. Graph. Forum 30, 2246–2257 (2011). https://doi.org/10.1111/j.1467-8659.2011.01974.x
Grinspun, E., Hirani, A.N., Desbrun, M., Schröder, P.: Discrete shells. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2003, pp. 62–67. Eurographics Association, Goslar, DEU (2003)
Guo, R., Zou, C., Hoiem, D.: Predicting complete 3D models of indoor scenes. arXiv preprint arXiv:1504.02437 (2015)
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)
He, L., Schaefer, S.: Mesh denoising via l0 minimization. In: Proceedings of ACM SIGGRAPH, pp. 64:1–64:8, January 2013
Huang, J., Su, H., Guibas, L.: Robust watertight manifold surface generation method for shapenet models. arXiv preprint arXiv:1802.01698 (2018)
Izadi, S., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST 2011 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Jacobson, A., Tosun, E., Sorkine, O., Zorin, D.: Mixed finite elements for variational surface modeling. In: Computer Graphics Forum, vol. 29, pp. 1565–1574. Wiley Online Library (2010)
Koch, S., et al.: ABC: a big cad model dataset for geometric deep learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Li, Y., Dai, A., Guibas, L., Nießner, M.: Database-assisted object retrieval for real-time 3D reconstruction. Comput. Graph. Forum 34(2), 435–446 (2015)
Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling Deformable Objects from a Single Depth Camera, pp. 167–174, November 2009. https://doi.org/10.1109/ICCV.2009.5459161
Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., Pajarola, R.: Object detection and classification from large-scale cluttered indoor scans. In: Computer Graphics Forum, vol. 33, pp. 11–21. Wiley Online Library (2014)
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
Newcombe, R.A., et al.: Kinectfusion: real-time dense surface mapping and tracking. In: IEEE ISMAR. IEEE, October 2011
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. In: ACM Transactions on Graphics (TOG) (2013)
Park, S.I., Lim, S.J.: Template-based reconstruction of surface mesh animation from point cloud animation. ETRI J. 36(6), 1008–1015 (2014)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE (2001)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR, pp. 1352–1359. IEEE Computer Society (2013)
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754 (2017)
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)
Stoll, C., Karni, Z., Rössl, C., Yamauchi, H., Seidel, H.P.: Template deformation for point cloud fitting. In: SPBG, pp. 27–35 (2006)
Sungjoon Choi, Zhou, Q., Koltun, V.: Robust reconstruction of indoor scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5556–5565, June 2015. https://doi.org/10.1109/CVPR.2015.7299195
Váša, L., Rus, J.: Dihedral angle mesh error: a fast perception correlated distortion measure for fixed connectivity triangle meshes. Comput. Graph. Forum 31(5), 1715–1724 (2012). https://doi.org/10.1111/j.1467-8659.2012.03176.x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2012.03176.x
Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: ElasticFusion: dense SLAM without a pose graph. In: Robotics: Science and Systems (RSS), Rome, Italy, July 2015
Zhou, X., Karpur, A., Gan, C., Luo, L., Huang, Q.: Unsupervised domain adaptation for 3D keypoint estimation via view consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 141–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_9
Acknowledgements
The authors acknowledge the usage of the Skoltech CDISE HPC cluster Zhores for obtaining the results presented in this paper. The work was partially supported by the Russian Science Foundation under Grant 19-41-04109.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
A Statistics on the Used Datasets
In Tables 4 and 5, we summarize statistical information on the number of instances and categories considered in our evaluation. As we require parts annotations as an important ingredient in our deformation, we only select instances in Scan2CAD [5] where the associated parts annotation in PartNet [31] is available, resulting in total in 9 categories (25%), 572 instances (18%), and 1979 annotated correspondences (14%). Note that the vast majority of cases remain within our consideration, keeping our evaluation comprehensive.
We further select the most well-presented six shape categories as our core evaluation set, outlined in Table 5. Note that as our method is non-learnable, we can just as easily experiment with the remaining categories, at the cost of somewhat reduced statistical power.
B Optimization Details
Our full experimental pipeline is a sequence of deformation stages with different optimization parameters, and Hessian being recomputed before each stage. Specifically, we perform one part-to-part optimization with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 0, \alpha _{\text {sharp}} = 0, \alpha _{\text {data}} = 5\times 10^{4}\) for 100 iterations, then we perform 5 runs of nearest-neighbor deformation for 50 iterations with parameters \(\alpha _{\text {shape}} = 1, \alpha _{\text {smooth}} = 10, \alpha _{\text {sharp}} = 10, \alpha _{\text {data}} = 10^3\). Such number of iterations was sufficient to achieve convergence with energy changes less than \(10^{-1}\) in our experiments. Runtime of our method breaks into cost computation (\(\mathtt {\sim }0.3\) s), backward (\(\mathtt {\sim }0.2\) s), and optimization steps containing the main bottleneck (sparse matrix-vector multiplication) (\(\mathtt {\sim }1.2\) s) for a typical \(10^4\) vertices mesh. All operations can be easily further optimized.
C Qualitative Fitting Results
In Fig. 6, we display a series of qualitative results with a variety of shape deformations with different classes of instances. Comparing to baselines, our framework achieves accurate fit while preserving sufficient perceptual quality.
Table 6 reports the results of surface quality evaluation using deformations obtained with our CAD-Deform vs. the baselines, category-wise. While outperforming the baseline methods across all categories, we discover the smoothness and sharpness energy terms to be the crucial ingredients in keeping high-quality meshes.
Figure 7 displays visually the deformation results using the three distinct classes, highlighting differences in surfaces obtained using the three methods.
Table 7 reports shape abnormality evaluation results across the six considered categories. Baselines show (Fig. 8) low reconstruction quality as evidenced by a larger number of black points. In other words, comparing to CAD-Deform, the distance from these meshes to undeformed ones is mush larger.
In Fig. 9, we show a series of examples for CAD-Deform ablation study. Perceptual quality degrades when excluding every term from the energy.
D Morphing
In this section, we present an additional series of examples of morphing properties (Fig. 10). Every iteration of optimization process gradually increases the quality of fit. With CAD-Deform we can morph each part to imitate the structure of the target shape.
E PartNet Annotation
This set of experiments shows how quality of fitting depends on mesh vertices labelling. We can provide labels for mesh in different ways depending on the level in PartNet hierarchy [31]. We observe the increase of fitting quality with greater level of detail (Table 8). Examples presented in Fig. 11 are selected as the most distinguishable deformations on different levels. There are minor visual differences in deformation performance of part labeling level.
F Fitting Accuracy Analysis
CAD-Deform deformation framework is sensitive to Accuracy threshold \(\tau \) for the distance between mesh vertices and close scan points. In Fig. 12 variation of \(\tau \) threshold is presented and we selected \(\tau = 0.2\text {~m}\) for fitting Accuracy metric.
G Perceptual Assessment and User Study Details
Having obtained a collection of deformed meshes, we aim to assess their visual quality in comparison to two baseline deformation methods: as-rigid-as-possible (ARAP) [38] and Harmonic deformation [7, 26], using a set of perceptual quality measures. The details of our user study design and visual assessment are provided in the supplementary. To this end, we use original and deformed meshes to compute DAME and reconstruction errors, as outlined in Sect. 6.1, and complement these with visual quality scores obtained with a user study (see below). These scores, presented in Table 3, demonstrate that shapes obtained using CAD-Deform have \(2\times \) higher surface quality, only slightly deviate from undeformed shapes as viewed by neural autoencoders, and receive \(2\times \) higher ratings in human assessment, while sacrificing only 1.1–4.5 % accuracy compared to other deformation methods.
Qualitative comparison of deformations obtained using ARAP [38], Harmonic deformation [7, 26], and our CAD-Deform, with shapes coloured according to the value of DAME measure [41]. Our approach results in drastic improvements in local surface quality, producing higher-quality surfaces compared to other deformations.
Qualitative comparison of reconstruction of point clouds extracted from mesh vertices. These meshes are obtained using ARAP [38], Harmonic deformation [7, 26], and our CAD-Deform, the first column corresponds to original undeformed meshes. The color of reconstructed point clouds is related to Earth-Mover’s Distance between reconstructed and original point clouds of mesh vertices.
Design of Our User Study. The users were requested to examine renders of shapes from four different categories: the original undeformed shapes as well as shapes deformed using ARAP, Harmonic, and CAD-Deform methods, and give a score to each shape according to the following perceptual aspects: surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Ten random shapes from each of the four categories have been rendered from eight different views and scored by 100 unique users on a scale from 1 (bad) to 10 (good). The resulting visual quality scores are computed by averaging over users and shapes in each category.
In Fig. 13, we present a distribution of user scores over different deformation methods and shapes. It can be clearly seen that users prefer our deformation results to baselines for all of the cases, which is obvious from the gap between histogram of CAD-Deform and ARAP/Harmonic histograms. At the same time, shapes deformed by CAD-Deform are close to undeformed ShapeNet shapes in terms of surface quality and smoothness, mesh symmetry, visual similarity to real-world objects, and overall consistency. Besides, in Tables 9, 10 we provide numbers for evaluation of ARAP/Harmonic deformations w.r.t. the change of Laplacian term weight.
Deformation performance depending on different level of labelling from the PartNet dataset [31]. Deformed mesh surfaces are colored according to the value of tMMD measure, with darker values corresponding to the larger distance values.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ishimtsev, V. et al. (2020). CAD-Deform: Deformable Fitting of CAD Models to 3D Scans. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-58601-0_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)