SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

Lambert, John; Li, Yuguang; Boyadzhiev, Ivaylo; Wixson, Lambert; Narayana, Manjunath; Hutchcroft, Will; Hays, James; Dellaert, Frank; Kang, Sing Bing

doi:10.1007/978-3-031-19821-2_37

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

John Lambert¹³,
Yuguang Li¹²,
Ivaylo Boyadzhiev¹²,
Lambert Wixson¹²,
Manjunath Narayana¹²,
Will Hutchcroft¹²,
James Hays¹³,
Frank Dellaert¹³ &
…
Sing Bing Kang¹²

Conference paper
First Online: 23 October 2022

2559 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13691))

Abstract

We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360\(^\circ \) panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM [16]. Once the room poses are computed, room layouts are inferred using HorizonNet [50], and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs.

J. Lambert—Work completed during an internship at Zillow Group.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Openings are constructs that divide a large room into multiple parts [14].
2.
We achieve this orientation assumption via pre-processing that straightens the panoramas using vanishing points [59].

References

Albanis, G., et al.: Pano3D: a holistic benchmark and a solid baseline for 360\(^{\circ }\) depth estimation. CVPR Workshops (2021)
Google Scholar
Aly, M., Bouguet, J.Y.: Street view goes indoors: automatic pose estimation from uncalibrated unordered spherical panoramas. In: 2012 IEEE Workshop on the Applications of Computer Vision (WACV), pp. 1–8 (2012)
Google Scholar
Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46
Chapter Google Scholar
Bao, S.Y., Savarese, S.: Semantic structure from motion. In: CVPR (2011)
Google Scholar
Cabral, R., Furukawa, Y.: Piecewise planar and compact floorplan reconstruction from images. In: CVPR (2014)
Google Scholar
Chang, A., et al.: Matterport3d: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV) (2017)
Google Scholar
Chen, J., Liu, C., Wu, J., Furukawa, Y.: Floor-SP: inverse CAD for floorplans by sequential room-wise shortest path. In: ICCV (2019)
Google Scholar
Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)
Google Scholar
Choi, S., Kim, J.H.: Fast and reliable minimal relative pose estimation under planar motion. Image Vis. Comput. 69, 103–112 (2018)
Article Google Scholar
Choudhary, S., Trevor, A.J., Christensen, H.I., Dellaert, F.: SLAM with object discovery, modeling and mapping. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1018–1025. IEEE (2014)
Google Scholar
Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv:2110.14168 (2021)
Cohen, A., Sattler, T., Pollefeys, M.: Merging the unmatchable: stitching visually disconnected SfM models. In: ICCV (2015)
Google Scholar
Cohen, A., Schönberger, J.L., Speciale, P., Sattler, T., Frahm, J.-M., Pollefeys, M.: Indoor-outdoor 3D reconstruction alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 285–300. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_18
Chapter Google Scholar
Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., Kang, S.B.: Zillow indoor dataset: annotated floor plans with 360deg panoramas and 3D room layouts. In: CVPR (2021)
Google Scholar
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996 (1996)
Google Scholar
Dellaert, F.: Factor graphs and GTSAM: a hands-on introduction. Technical report, Georgia Institute of Technology (2012)
Google Scholar
Dellaert, F., Burgard, W., Fox, D., Thrun, S.: Using the condensation algorithm for robust, vision-based mobile robot localization. In: CVPR (1999)
Google Scholar
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)
Google Scholar
Enqvist, O., Kahl, F., Olsson, C.: Non-sequential structure from motion. In: ICCV Workshops (2011)
Google Scholar
Fang, H., Lafarge, F., Pan, C., Huang, H.: Floorplan generation from 3D point clouds: a space partitioning approach. ISPRS J. Photogram. Remote Sens. 175, 44–55 (2021)
Article Google Scholar
Fang, H., Pan, C., Huang, H.: Structure-aware indoor scene reconstruction via two levels of abstraction. ISPRS J. Photogram. Remote Sens. 178, 155–170 (2021)
Article Google Scholar
Farin, D., Effelsberg, W., de With, P.H.: Floor-plan reconstruction from panoramic images. In: Proceedings of the 15th ACM International Conference on Multimedia (2007)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from images. In: ICCV (2009)
Google Scholar
Gargallo, P., Kuang, Y., et al.: OpenSfM (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Jin, L., Qian, S., Owens, A., Fouhey, D.F.: Planar surface reconstruction from sparse views. In: ICCV (2021)
Google Scholar
Kim, Y.M., Dolson, J., Sokolsky, M., Koltun, V., Thrun, S.: Interactive acquisition of residential floor plans. In: ICRA (2012)
Google Scholar
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)
Google Scholar
Lin, C., Li, C., Wang, W.: Floorplan-jigsaw: jointly estimating scene layout and aligning partial scans. In: ICCV (2019)
Google Scholar
Liu, C., Wu, J., Furukawa, Y.: FloorNet: a unified framework for floorplan reconstruction from 3D scans. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 203–219. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_13
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004)
Article Google Scholar
Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: ICCV (2013)
Google Scholar
Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5
Chapter Google Scholar
Okorn, B., Xiong, X., Akinci, B., Huber, D.: Toward automated modeling of floor plans. In: 3D DPVT (2010)
Google Scholar
Oskarsson, M.: Two-view orthographic epipolar geometry: minimal and optimal solvers. J. Math. Imaging Vis. 60(2), 163–173 (2018)
Article MathSciNet Google Scholar
Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. Acta Numerica 26, 305–364 (2017)
Article MathSciNet Google Scholar
Pintore, G., Ganovelli, F., Pintus, R., Scopigno, R., Gobbetti, E.: 3D floor plan recovery from overlapping spherical images. Comput. Visual Media 4(4), 367–383 (2018)
Article Google Scholar
Pintore, G., Ganovelli, F., Villanueva, A.J., Gobbetti, E.: Automatic modeling of cluttered multi-room floor plans from panoramic images. Comput. Graph. Forum 38(7) (2019)
Google Scholar
Pintore, G., Mura, C., Ganovelli, F., Fuentes-Perez, L., Pajarola, R., Gobbetti, E.: State-of-the-art in automatic 3D reconstruction of structured indoor environments. Comput. Graphics Forum 39(2) (2020)
Google Scholar
Purushwalkam, S., et al.: Audio-visual floorplan reconstruction. In: ICCV (2021)
Google Scholar
Reddy, B., Chatterji, B.: An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 5(8), 1266–1271 (1996)
Article Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Shabani, M.A., Song, W., Odamaki, M., Fujiki, H., Furukawa, Y.: Extreme structure from motion for indoor panoramas without visual overlaps. In: ICCV (2021)
Google Scholar
Shen, J., Yin, Y., Li, L., Shang, L., Zhang, M., Liu, Q.: Generate & Rank: a multi-task framework for math word problems. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics (2021)
Google Scholar
Son, K., Moreno, D., Hays, J., Cooper, D.B.: Solving small-piece jigsaw puzzles by growing consensus. In: CVPR (2016)
Google Scholar
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)
Google Scholar
Stekovic, S., Rad, M., Fraundorfer, F., Lepetit, V.: Montefloor: extending MCTS for reconstructing accurate large-scale floor plans. In: ICCV (2021)
Google Scholar
Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1D representation and PANO stretch data augmentation. In: CVPR (2019)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: HohoNet: 360 indoor holistic understanding with latent horizontal features. In: CVPR (2021)
Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LOFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Google Scholar
Sweeney, C., Hollerer, T., Turk, M.: Theia: a fast and scalable structure-from-motion library. In: Proceedings of the 23rd ACM International Conference on Multimedia (2015)
Google Scholar
Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5
Chapter Google Scholar
Yang, Z., Pan, J.Z., Luo, L., Zhou, X., Grauman, K., Huang, Q.: Extreme relative pose estimation for RGB-D scans via scene completion. In: CVPR (2019)
Google Scholar
Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: CVPR (2020)
Google Scholar
Zach, C., Klopschitz, M., Pollefeys, M.: Disambiguating visual relations using loop constraints. In: CVPR (2010)
Google Scholar
Zhang, F., Nauata, N., Furukawa, Y.: Conv-MPN: convolutional message passing neural network for structured outdoor architecture reconstruction. In: CVPR (2020)
Google Scholar
Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_43
Chapter Google Scholar
Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., Zhou, Z.: Structured3D: a large photo-realistic dataset for structured 3D modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 519–535. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_30
Chapter Google Scholar
Zou, C., et al.: Manhattan room layout reconstruction from a single 360\(^{\circ }\) image: a comparative study of state-of-the-art methods. Int. J. Comput. Vis. 129(5), 1410–1431 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zillow Group, Seattle, USA
Yuguang Li, Ivaylo Boyadzhiev, Lambert Wixson, Manjunath Narayana, Will Hutchcroft & Sing Bing Kang
Georgia Institute of Technology, Atlanta, USA
John Lambert, James Hays & Frank Dellaert

Authors

John Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Yuguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ivaylo Boyadzhiev
View author publications
You can also search for this author in PubMed Google Scholar
Lambert Wixson
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Narayana
View author publications
You can also search for this author in PubMed Google Scholar
Will Hutchcroft
View author publications
You can also search for this author in PubMed Google Scholar
James Hays
View author publications
You can also search for this author in PubMed Google Scholar
Frank Dellaert
View author publications
You can also search for this author in PubMed Google Scholar
Sing Bing Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Lambert .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18460 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lambert, J. et al. (2022). SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13691. Springer, Cham. https://doi.org/10.1007/978-3-031-19821-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-19821-2_37
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19820-5
Online ISBN: 978-3-031-19821-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics