Skip to main content

Structural Deep Metric Learning for Room Layout Estimation

  • Conference paper
  • First Online:
Book cover Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12363))

Included in the following conference series:

Abstract

In this paper, we propose a structural deep metric learning (SDML) method for room layout estimation, which aims to recover the 3D spatial layout of a cluttered indoor scene from a monocular RGB image. Different from existing room layout estimation methods that solve a regression or per-pixel classification problem, we formulate the room layout estimation problem from a metric learning perspective where we explicitly model the structural relations across different images. We propose to learn a latent embedding space where the Euclidean distance can characterize the actual structural difference between the layouts of two rooms. We then minimize the discrepancy between an image and its ground-truth layout in the learned embedding space. We employ a metric model and a layout encoder to map the RGB images and the ground-truth layouts to the embedding space, respectively, and a layout decoder to map the embeddings to the corresponding layouts, where the whole framework is trained in an end-to-end manner. We perform experiments on the widely used Hedau and LSUN datasets and achieve state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boniardi, F., Valada, A., Mohan, R., Caselitz, T., Burgard, W.: Robot localization in floor plans using a room layout edge extraction network. In: Proceedings of the IROS, pp. 5291–5297 (2019)

    Google Scholar 

  2. Coughlan, J.M., Yuille, A.L.: The manhattan world assumption: regularities in scene statistics which enable Bayesian inference. In: Proceedings of the NIPS, pp. 845–851 (2001)

    Google Scholar 

  3. Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of the CVPR, pp. 616–624 (2016)

    Google Scholar 

  4. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the ICML, pp. 209–216 (2007)

    Google Scholar 

  5. Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., Barnard, K.: Bayesian geometric modeling of indoor scenes. In: Proceedings of the CVPR, pp. 2719–2726 (2012)

    Google Scholar 

  6. Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding Bayesian rooms using composite 3D object models. In: Proceedings of the CVPR, pp. 153–160 (2013)

    Google Scholar 

  7. Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: Proceedings of the CVPR, pp. 2780–2789 (2018)

    Google Scholar 

  8. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the ICML, pp. 2650–2658 (2015)

    Google Scholar 

  9. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the NIPS, pp. 2366–2374 (2014)

    Google Scholar 

  10. Fix, E., Hodges Jr., J.L.: Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California Univ Berkeley (1951)

    Google Scholar 

  11. Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Proceedings of the NIPS, pp. 451–458 (2006)

    Google Scholar 

  12. Gupta, A., Hebert, M., Kanade, T., Blei, D.M.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings of the NIPS, pp. 1288–1296 (2010)

    Google Scholar 

  13. Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the CVPR, pp. 4731–4740 (2015)

    Google Scholar 

  14. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the CVPR, pp. 1735–1742 (2006)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)

    Google Scholar 

  16. Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of the ICCV, pp. 1849–1856 (2009)

    Google Scholar 

  17. Hirzer, M., Roth, P.M., Lepetit, V.: Smart hypothesis generation for efficient and robust room layout estimation. In: Proceedings of the WACV, pp. 2912–2920 (2020)

    Google Scholar 

  18. Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Proceedings of the NIPS, pp. 601–608 (2007)

    Google Scholar 

  19. Kim, S., Seo, M., Laptev, I., Cho, M., Kwak, S.: Deep metric learning beyond binary supervision. In: Proceedings of the CVPR., pp. 2288–2297 (2019)

    Google Scholar 

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  21. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  22. Kwak, S., Cho, M., Laptev, I.: Thin-slicing for pose: learning to understand pose without explicit pose estimation. In: Proceedings of the CVPR, pp. 4938–4947 (2016)

    Google Scholar 

  23. Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: Proceedings of the ICCV, pp. 4865–4874 (2017)

    Google Scholar 

  24. Lee, D.C., Gupta, A., Hebert, M., Kanade, T., Blei, D.M.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings of the NIPS, pp. 1288–1296 (2010)

    Google Scholar 

  25. Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of the CVPR, pp. 2136–2143 (2009)

    Google Scholar 

  26. Lin, C., Li, C., Furukawa, Y., Wang, W.: Floorplan priors for joint camera pose and room layout estimation. arXiv abs/1812.06677 (2018)

  27. Liu, C., Schwing, A.G., Kundu, K., Urtasun, R., Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: Proceedings of the CVPR, pp. 3413–3421 (2015)

    Google Scholar 

  28. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  29. Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the ICCV, pp. 936–944 (2015)

    Google Scholar 

  30. Mirowski, P., et al.: Learning to navigate in complex environments. In: Proceedings of the ICLR (2017)

    Google Scholar 

  31. Ramalingam, S., Pillai, J.K., Jain, A., Taguchi, Y.: Manhattan junction catalogue for spatial reasoning of indoor scenes. In: Proceedings of the CVPR, pp. 3065–3072 (2013)

    Google Scholar 

  32. Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A coarse-to-fine indoor layout estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_3

    Chapter  Google Scholar 

  33. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)

    Article  Google Scholar 

  34. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the CVPR, pp. 815–823 (2015)

    Google Scholar 

  35. Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3D indoor scene understanding. In: Proceedings of the CVPR, pp. 2815–2822 (2012)

    Google Scholar 

  36. Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_22

    Chapter  Google Scholar 

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the ICLR (2015)

    Google Scholar 

  38. Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Proceedings of the NIPS, pp. 1857–1865 (2016)

    Google Scholar 

  39. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the CVPR, pp. 4004–4012 (2016)

    Google Scholar 

  40. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the CVPR, pp. 567–576 (2015)

    Google Scholar 

  41. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the CVPR, pp. 1–9 (2015)

    Google Scholar 

  42. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6(Sep), 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  43. Wang, H., Gould, S., Koller, D.: Discriminative learning with latent variables for cluttered indoor scene understanding. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 497–510. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_36

    Chapter  Google Scholar 

  44. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)

    MATH  Google Scholar 

  45. Weisstein, E.W.: CRC Concise Encyclopedia of Mathematics. Chapman and Hall/CRC, New York (2002)

    Book  Google Scholar 

  46. Xiao, J., Furukawa, Y.: Reconstructing the world’s museums. Int. J. Comput. Vision 110(3), 243–258 (2014)

    Article  Google Scholar 

  47. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from Abbey to Zoo. In: Proceedings of the CVPR, pp. 3485–3492 (2010)

    Google Scholar 

  48. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of the ICLR (2016)

    Google Scholar 

  49. Zhang, W., Zhang, W., Gu, J.: Edge-semantic learning strategy for layout estimation in indoor environment. TCYB (2019)

    Google Scholar 

  50. Zhang, Y., Yu, F., Song, S., Xu, P., Seff, A., Xiao, J.: Large-scale scene understanding challenge: room layout estimation. In: CVPR Workshop (2015)

    Google Scholar 

  51. Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of the CVPR, pp. 10–18 (2017)

    Google Scholar 

  52. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the CVPR, pp. 2881–2890 (2017)

    Google Scholar 

  53. Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the CVPR, pp. 3119–3126 (2013)

    Google Scholar 

  54. Zheng, W., Chen, Z., Lu, J., Zhou, J.: Hardness-aware deep metric learning. In: Proceedings of the CVPR, pp. 72–81 (2019)

    Google Scholar 

  55. Zhu, F., Zhu, L., Yang, Y.: Sim-real joint reinforcement transfer for 3D indoor navigation. In: Proceedings of the CVPR, pp. 11388–11397 (2019)

    Google Scholar 

  56. Zou, C., Colburn, A., Shan, Q., Hoiem, D.: LayoutNet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of the CVPR, pp. 2051–2059 (2018)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Yangyang Song for his kind support and helpful discussions. This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Natural Science Foundation under Grant No. L172051, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiwen Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, W., Lu, J., Zhou, J. (2020). Structural Deep Metric Learning for Room Layout Estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58523-5_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58522-8

  • Online ISBN: 978-3-030-58523-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics