Abstract
We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360\(^\circ \) panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM [16]. Once the room poses are computed, room layouts are inferred using HorizonNet [50], and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs.
J. Lambert—Work completed during an internship at Zillow Group.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Albanis, G., et al.: Pano3D: a holistic benchmark and a solid baseline for 360\(^{\circ }\) depth estimation. CVPR Workshops (2021)
Aly, M., Bouguet, J.Y.: Street view goes indoors: automatic pose estimation from uncalibrated unordered spherical panoramas. In: 2012 IEEE Workshop on the Applications of Computer Vision (WACV), pp. 1–8 (2012)
Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46
Bao, S.Y., Savarese, S.: Semantic structure from motion. In: CVPR (2011)
Cabral, R., Furukawa, Y.: Piecewise planar and compact floorplan reconstruction from images. In: CVPR (2014)
Chang, A., et al.: Matterport3d: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV) (2017)
Chen, J., Liu, C., Wu, J., Furukawa, Y.: Floor-SP: inverse CAD for floorplans by sequential room-wise shortest path. In: ICCV (2019)
Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)
Choi, S., Kim, J.H.: Fast and reliable minimal relative pose estimation under planar motion. Image Vis. Comput. 69, 103–112 (2018)
Choudhary, S., Trevor, A.J., Christensen, H.I., Dellaert, F.: SLAM with object discovery, modeling and mapping. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1018–1025. IEEE (2014)
Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv:2110.14168 (2021)
Cohen, A., Sattler, T., Pollefeys, M.: Merging the unmatchable: stitching visually disconnected SfM models. In: ICCV (2015)
Cohen, A., Schönberger, J.L., Speciale, P., Sattler, T., Frahm, J.-M., Pollefeys, M.: Indoor-outdoor 3D reconstruction alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 285–300. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_18
Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., Kang, S.B.: Zillow indoor dataset: annotated floor plans with 360deg panoramas and 3D room layouts. In: CVPR (2021)
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996 (1996)
Dellaert, F.: Factor graphs and GTSAM: a hands-on introduction. Technical report, Georgia Institute of Technology (2012)
Dellaert, F., Burgard, W., Fox, D., Thrun, S.: Using the condensation algorithm for robust, vision-based mobile robot localization. In: CVPR (1999)
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)
Enqvist, O., Kahl, F., Olsson, C.: Non-sequential structure from motion. In: ICCV Workshops (2011)
Fang, H., Lafarge, F., Pan, C., Huang, H.: Floorplan generation from 3D point clouds: a space partitioning approach. ISPRS J. Photogram. Remote Sens. 175, 44–55 (2021)
Fang, H., Pan, C., Huang, H.: Structure-aware indoor scene reconstruction via two levels of abstraction. ISPRS J. Photogram. Remote Sens. 178, 155–170 (2021)
Farin, D., Effelsberg, W., de With, P.H.: Floor-plan reconstruction from panoramic images. In: Proceedings of the 15th ACM International Conference on Multimedia (2007)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from images. In: ICCV (2009)
Gargallo, P., Kuang, Y., et al.: OpenSfM (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Jin, L., Qian, S., Owens, A., Fouhey, D.F.: Planar surface reconstruction from sparse views. In: ICCV (2021)
Kim, Y.M., Dolson, J., Sokolsky, M., Koltun, V., Thrun, S.: Interactive acquisition of residential floor plans. In: ICRA (2012)
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)
Lin, C., Li, C., Wang, W.: Floorplan-jigsaw: jointly estimating scene layout and aligning partial scans. In: ICCV (2019)
Liu, C., Wu, J., Furukawa, Y.: FloorNet: a unified framework for floorplan reconstruction from 3D scans. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 203–219. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_13
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004)
Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: ICCV (2013)
Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5
Okorn, B., Xiong, X., Akinci, B., Huber, D.: Toward automated modeling of floor plans. In: 3D DPVT (2010)
Oskarsson, M.: Two-view orthographic epipolar geometry: minimal and optimal solvers. J. Math. Imaging Vis. 60(2), 163–173 (2018)
Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. Acta Numerica 26, 305–364 (2017)
Pintore, G., Ganovelli, F., Pintus, R., Scopigno, R., Gobbetti, E.: 3D floor plan recovery from overlapping spherical images. Comput. Visual Media 4(4), 367–383 (2018)
Pintore, G., Ganovelli, F., Villanueva, A.J., Gobbetti, E.: Automatic modeling of cluttered multi-room floor plans from panoramic images. Comput. Graph. Forum 38(7) (2019)
Pintore, G., Mura, C., Ganovelli, F., Fuentes-Perez, L., Pajarola, R., Gobbetti, E.: State-of-the-art in automatic 3D reconstruction of structured indoor environments. Comput. Graphics Forum 39(2) (2020)
Purushwalkam, S., et al.: Audio-visual floorplan reconstruction. In: ICCV (2021)
Reddy, B., Chatterji, B.: An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 5(8), 1266–1271 (1996)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Shabani, M.A., Song, W., Odamaki, M., Fujiki, H., Furukawa, Y.: Extreme structure from motion for indoor panoramas without visual overlaps. In: ICCV (2021)
Shen, J., Yin, Y., Li, L., Shang, L., Zhang, M., Liu, Q.: Generate & Rank: a multi-task framework for math word problems. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics (2021)
Son, K., Moreno, D., Hays, J., Cooper, D.B.: Solving small-piece jigsaw puzzles by growing consensus. In: CVPR (2016)
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)
Stekovic, S., Rad, M., Fraundorfer, F., Lepetit, V.: Montefloor: extending MCTS for reconstructing accurate large-scale floor plans. In: ICCV (2021)
Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1D representation and PANO stretch data augmentation. In: CVPR (2019)
Sun, C., Sun, M., Chen, H.T.: HohoNet: 360 indoor holistic understanding with latent horizontal features. In: CVPR (2021)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LOFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Sweeney, C., Hollerer, T., Turk, M.: Theia: a fast and scalable structure-from-motion library. In: Proceedings of the 23rd ACM International Conference on Multimedia (2015)
Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5
Yang, Z., Pan, J.Z., Luo, L., Zhou, X., Grauman, K., Huang, Q.: Extreme relative pose estimation for RGB-D scans via scene completion. In: CVPR (2019)
Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: CVPR (2020)
Zach, C., Klopschitz, M., Pollefeys, M.: Disambiguating visual relations using loop constraints. In: CVPR (2010)
Zhang, F., Nauata, N., Furukawa, Y.: Conv-MPN: convolutional message passing neural network for structured outdoor architecture reconstruction. In: CVPR (2020)
Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_43
Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., Zhou, Z.: Structured3D: a large photo-realistic dataset for structured 3D modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 519–535. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_30
Zou, C., et al.: Manhattan room layout reconstruction from a single 360\(^{\circ }\) image: a comparative study of state-of-the-art methods. Int. J. Comput. Vis. 129(5), 1410–1431 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lambert, J. et al. (2022). SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13691. Springer, Cham. https://doi.org/10.1007/978-3-031-19821-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-19821-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19820-5
Online ISBN: 978-3-031-19821-2
eBook Packages: Computer ScienceComputer Science (R0)