Abstract
In this paper, we propose a structural deep metric learning (SDML) method for room layout estimation, which aims to recover the 3D spatial layout of a cluttered indoor scene from a monocular RGB image. Different from existing room layout estimation methods that solve a regression or per-pixel classification problem, we formulate the room layout estimation problem from a metric learning perspective where we explicitly model the structural relations across different images. We propose to learn a latent embedding space where the Euclidean distance can characterize the actual structural difference between the layouts of two rooms. We then minimize the discrepancy between an image and its ground-truth layout in the learned embedding space. We employ a metric model and a layout encoder to map the RGB images and the ground-truth layouts to the embedding space, respectively, and a layout decoder to map the embeddings to the corresponding layouts, where the whole framework is trained in an end-to-end manner. We perform experiments on the widely used Hedau and LSUN datasets and achieve state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boniardi, F., Valada, A., Mohan, R., Caselitz, T., Burgard, W.: Robot localization in floor plans using a room layout edge extraction network. In: Proceedings of the IROS, pp. 5291–5297 (2019)
Coughlan, J.M., Yuille, A.L.: The manhattan world assumption: regularities in scene statistics which enable Bayesian inference. In: Proceedings of the NIPS, pp. 845–851 (2001)
Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of the CVPR, pp. 616–624 (2016)
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the ICML, pp. 209–216 (2007)
Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., Barnard, K.: Bayesian geometric modeling of indoor scenes. In: Proceedings of the CVPR, pp. 2719–2726 (2012)
Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding Bayesian rooms using composite 3D object models. In: Proceedings of the CVPR, pp. 153–160 (2013)
Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: Proceedings of the CVPR, pp. 2780–2789 (2018)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the ICML, pp. 2650–2658 (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the NIPS, pp. 2366–2374 (2014)
Fix, E., Hodges Jr., J.L.: Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California Univ Berkeley (1951)
Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Proceedings of the NIPS, pp. 451–458 (2006)
Gupta, A., Hebert, M., Kanade, T., Blei, D.M.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings of the NIPS, pp. 1288–1296 (2010)
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the CVPR, pp. 4731–4740 (2015)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the CVPR, pp. 1735–1742 (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of the ICCV, pp. 1849–1856 (2009)
Hirzer, M., Roth, P.M., Lepetit, V.: Smart hypothesis generation for efficient and robust room layout estimation. In: Proceedings of the WACV, pp. 2912–2920 (2020)
Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Proceedings of the NIPS, pp. 601–608 (2007)
Kim, S., Seo, M., Laptev, I., Cho, M., Kwak, S.: Deep metric learning beyond binary supervision. In: Proceedings of the CVPR., pp. 2288–2297 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Kwak, S., Cho, M., Laptev, I.: Thin-slicing for pose: learning to understand pose without explicit pose estimation. In: Proceedings of the CVPR, pp. 4938–4947 (2016)
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: Proceedings of the ICCV, pp. 4865–4874 (2017)
Lee, D.C., Gupta, A., Hebert, M., Kanade, T., Blei, D.M.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings of the NIPS, pp. 1288–1296 (2010)
Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of the CVPR, pp. 2136–2143 (2009)
Lin, C., Li, C., Furukawa, Y., Wang, W.: Floorplan priors for joint camera pose and room layout estimation. arXiv abs/1812.06677 (2018)
Liu, C., Schwing, A.G., Kundu, K., Urtasun, R., Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: Proceedings of the CVPR, pp. 3413–3421 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR, pp. 3431–3440 (2015)
Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the ICCV, pp. 936–944 (2015)
Mirowski, P., et al.: Learning to navigate in complex environments. In: Proceedings of the ICLR (2017)
Ramalingam, S., Pillai, J.K., Jain, A., Taguchi, Y.: Manhattan junction catalogue for spatial reasoning of indoor scenes. In: Proceedings of the CVPR, pp. 3065–3072 (2013)
Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A coarse-to-fine indoor layout estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_3
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the CVPR, pp. 815–823 (2015)
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3D indoor scene understanding. In: Proceedings of the CVPR, pp. 2815–2822 (2012)
Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_22
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the ICLR (2015)
Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Proceedings of the NIPS, pp. 1857–1865 (2016)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the CVPR, pp. 4004–4012 (2016)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the CVPR, pp. 567–576 (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the CVPR, pp. 1–9 (2015)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6(Sep), 1453–1484 (2005)
Wang, H., Gould, S., Koller, D.: Discriminative learning with latent variables for cluttered indoor scene understanding. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 497–510. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_36
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)
Weisstein, E.W.: CRC Concise Encyclopedia of Mathematics. Chapman and Hall/CRC, New York (2002)
Xiao, J., Furukawa, Y.: Reconstructing the world’s museums. Int. J. Comput. Vision 110(3), 243–258 (2014)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from Abbey to Zoo. In: Proceedings of the CVPR, pp. 3485–3492 (2010)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of the ICLR (2016)
Zhang, W., Zhang, W., Gu, J.: Edge-semantic learning strategy for layout estimation in indoor environment. TCYB (2019)
Zhang, Y., Yu, F., Song, S., Xu, P., Seff, A., Xiao, J.: Large-scale scene understanding challenge: room layout estimation. In: CVPR Workshop (2015)
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of the CVPR, pp. 10–18 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the CVPR, pp. 2881–2890 (2017)
Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the CVPR, pp. 3119–3126 (2013)
Zheng, W., Chen, Z., Lu, J., Zhou, J.: Hardness-aware deep metric learning. In: Proceedings of the CVPR, pp. 72–81 (2019)
Zhu, F., Zhu, L., Yang, Y.: Sim-real joint reinforcement transfer for 3D indoor navigation. In: Proceedings of the CVPR, pp. 11388–11397 (2019)
Zou, C., Colburn, A., Shan, Q., Hoiem, D.: LayoutNet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of the CVPR, pp. 2051–2059 (2018)
Acknowledgements
The authors would like to thank Yangyang Song for his kind support and helpful discussions. This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Natural Science Foundation under Grant No. L172051, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, W., Lu, J., Zhou, J. (2020). Structural Deep Metric Learning for Room Layout Estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-58523-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)