Skip to main content

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13691))

Abstract

We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360\(^\circ \) panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM [16]. Once the room poses are computed, room layouts are inferred using HorizonNet [50], and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs.

J. Lambert—Work completed during an internship at Zillow Group.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Openings are constructs that divide a large room into multiple parts [14].

  2. 2.

    We achieve this orientation assumption via pre-processing that straightens the panoramas using vanishing points [59].

References

  1. Albanis, G., et al.: Pano3D: a holistic benchmark and a solid baseline for 360\(^{\circ }\) depth estimation. CVPR Workshops (2021)

    Google Scholar 

  2. Aly, M., Bouguet, J.Y.: Street view goes indoors: automatic pose estimation from uncalibrated unordered spherical panoramas. In: 2012 IEEE Workshop on the Applications of Computer Vision (WACV), pp. 1–8 (2012)

    Google Scholar 

  3. Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46

    Chapter  Google Scholar 

  4. Bao, S.Y., Savarese, S.: Semantic structure from motion. In: CVPR (2011)

    Google Scholar 

  5. Cabral, R., Furukawa, Y.: Piecewise planar and compact floorplan reconstruction from images. In: CVPR (2014)

    Google Scholar 

  6. Chang, A., et al.: Matterport3d: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV) (2017)

    Google Scholar 

  7. Chen, J., Liu, C., Wu, J., Furukawa, Y.: Floor-SP: inverse CAD for floorplans by sequential room-wise shortest path. In: ICCV (2019)

    Google Scholar 

  8. Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)

    Google Scholar 

  9. Choi, S., Kim, J.H.: Fast and reliable minimal relative pose estimation under planar motion. Image Vis. Comput. 69, 103–112 (2018)

    Article  Google Scholar 

  10. Choudhary, S., Trevor, A.J., Christensen, H.I., Dellaert, F.: SLAM with object discovery, modeling and mapping. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1018–1025. IEEE (2014)

    Google Scholar 

  11. Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv:2110.14168 (2021)

  12. Cohen, A., Sattler, T., Pollefeys, M.: Merging the unmatchable: stitching visually disconnected SfM models. In: ICCV (2015)

    Google Scholar 

  13. Cohen, A., Schönberger, J.L., Speciale, P., Sattler, T., Frahm, J.-M., Pollefeys, M.: Indoor-outdoor 3D reconstruction alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 285–300. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_18

    Chapter  Google Scholar 

  14. Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., Kang, S.B.: Zillow indoor dataset: annotated floor plans with 360deg panoramas and 3D room layouts. In: CVPR (2021)

    Google Scholar 

  15. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996 (1996)

    Google Scholar 

  16. Dellaert, F.: Factor graphs and GTSAM: a hands-on introduction. Technical report, Georgia Institute of Technology (2012)

    Google Scholar 

  17. Dellaert, F., Burgard, W., Fox, D., Thrun, S.: Using the condensation algorithm for robust, vision-based mobile robot localization. In: CVPR (1999)

    Google Scholar 

  18. Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)

    Google Scholar 

  19. Enqvist, O., Kahl, F., Olsson, C.: Non-sequential structure from motion. In: ICCV Workshops (2011)

    Google Scholar 

  20. Fang, H., Lafarge, F., Pan, C., Huang, H.: Floorplan generation from 3D point clouds: a space partitioning approach. ISPRS J. Photogram. Remote Sens. 175, 44–55 (2021)

    Article  Google Scholar 

  21. Fang, H., Pan, C., Huang, H.: Structure-aware indoor scene reconstruction via two levels of abstraction. ISPRS J. Photogram. Remote Sens. 178, 155–170 (2021)

    Article  Google Scholar 

  22. Farin, D., Effelsberg, W., de With, P.H.: Floor-plan reconstruction from panoramic images. In: Proceedings of the 15th ACM International Conference on Multimedia (2007)

    Google Scholar 

  23. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  24. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from images. In: ICCV (2009)

    Google Scholar 

  25. Gargallo, P., Kuang, Y., et al.: OpenSfM (2016)

    Google Scholar 

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  27. Jin, L., Qian, S., Owens, A., Fouhey, D.F.: Planar surface reconstruction from sparse views. In: ICCV (2021)

    Google Scholar 

  28. Kim, Y.M., Dolson, J., Sokolsky, M., Koltun, V., Thrun, S.: Interactive acquisition of residential floor plans. In: ICRA (2012)

    Google Scholar 

  29. Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)

    Google Scholar 

  30. Lin, C., Li, C., Wang, W.: Floorplan-jigsaw: jointly estimating scene layout and aligning partial scans. In: ICCV (2019)

    Google Scholar 

  31. Liu, C., Wu, J., Furukawa, Y.: FloorNet: a unified framework for floorplan reconstruction from 3D scans. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 203–219. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_13

    Chapter  Google Scholar 

  32. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  33. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004)

    Article  Google Scholar 

  34. Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: ICCV (2013)

    Google Scholar 

  35. Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5

    Chapter  Google Scholar 

  36. Okorn, B., Xiong, X., Akinci, B., Huber, D.: Toward automated modeling of floor plans. In: 3D DPVT (2010)

    Google Scholar 

  37. Oskarsson, M.: Two-view orthographic epipolar geometry: minimal and optimal solvers. J. Math. Imaging Vis. 60(2), 163–173 (2018)

    Article  MathSciNet  Google Scholar 

  38. Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. Acta Numerica 26, 305–364 (2017)

    Article  MathSciNet  Google Scholar 

  39. Pintore, G., Ganovelli, F., Pintus, R., Scopigno, R., Gobbetti, E.: 3D floor plan recovery from overlapping spherical images. Comput. Visual Media 4(4), 367–383 (2018)

    Article  Google Scholar 

  40. Pintore, G., Ganovelli, F., Villanueva, A.J., Gobbetti, E.: Automatic modeling of cluttered multi-room floor plans from panoramic images. Comput. Graph. Forum 38(7) (2019)

    Google Scholar 

  41. Pintore, G., Mura, C., Ganovelli, F., Fuentes-Perez, L., Pajarola, R., Gobbetti, E.: State-of-the-art in automatic 3D reconstruction of structured indoor environments. Comput. Graphics Forum 39(2) (2020)

    Google Scholar 

  42. Purushwalkam, S., et al.: Audio-visual floorplan reconstruction. In: ICCV (2021)

    Google Scholar 

  43. Reddy, B., Chatterji, B.: An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 5(8), 1266–1271 (1996)

    Article  Google Scholar 

  44. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  45. Shabani, M.A., Song, W., Odamaki, M., Fujiki, H., Furukawa, Y.: Extreme structure from motion for indoor panoramas without visual overlaps. In: ICCV (2021)

    Google Scholar 

  46. Shen, J., Yin, Y., Li, L., Shang, L., Zhang, M., Liu, Q.: Generate & Rank: a multi-task framework for math word problems. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics (2021)

    Google Scholar 

  47. Son, K., Moreno, D., Hays, J., Cooper, D.B.: Solving small-piece jigsaw puzzles by growing consensus. In: CVPR (2016)

    Google Scholar 

  48. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)

    Google Scholar 

  49. Stekovic, S., Rad, M., Fraundorfer, F., Lepetit, V.: Montefloor: extending MCTS for reconstructing accurate large-scale floor plans. In: ICCV (2021)

    Google Scholar 

  50. Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1D representation and PANO stretch data augmentation. In: CVPR (2019)

    Google Scholar 

  51. Sun, C., Sun, M., Chen, H.T.: HohoNet: 360 indoor holistic understanding with latent horizontal features. In: CVPR (2021)

    Google Scholar 

  52. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LOFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  53. Sweeney, C., Hollerer, T., Turk, M.: Theia: a fast and scalable structure-from-motion library. In: Proceedings of the 23rd ACM International Conference on Multimedia (2015)

    Google Scholar 

  54. Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5

    Chapter  Google Scholar 

  55. Yang, Z., Pan, J.Z., Luo, L., Zhou, X., Grauman, K., Huang, Q.: Extreme relative pose estimation for RGB-D scans via scene completion. In: CVPR (2019)

    Google Scholar 

  56. Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: CVPR (2020)

    Google Scholar 

  57. Zach, C., Klopschitz, M., Pollefeys, M.: Disambiguating visual relations using loop constraints. In: CVPR (2010)

    Google Scholar 

  58. Zhang, F., Nauata, N., Furukawa, Y.: Conv-MPN: convolutional message passing neural network for structured outdoor architecture reconstruction. In: CVPR (2020)

    Google Scholar 

  59. Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_43

    Chapter  Google Scholar 

  60. Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., Zhou, Z.: Structured3D: a large photo-realistic dataset for structured 3D modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 519–535. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_30

    Chapter  Google Scholar 

  61. Zou, C., et al.: Manhattan room layout reconstruction from a single 360\(^{\circ }\) image: a comparative study of state-of-the-art methods. Int. J. Comput. Vis. 129(5), 1410–1431 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Lambert .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18460 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lambert, J. et al. (2022). SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13691. Springer, Cham. https://doi.org/10.1007/978-3-031-19821-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19821-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19820-5

  • Online ISBN: 978-3-031-19821-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics