Indoor scene modeling from a single image using normal inference and edge features

Liu, Mingming; Guo, Yanwen; Wang, Jun

doi:10.1007/s00371-016-1348-3

Indoor scene modeling from a single image using normal inference and edge features

Original Article
Published: 11 January 2017

Volume 33, pages 1227–1240, (2017)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Mingming Liu¹,
Yanwen Guo¹ &
Jun Wang²

634 Accesses
9 Citations
Explore all metrics

Abstract

We present in this paper an interactive approach for semantically modeling the indoor environment given only a single indoor image as input, without requiring access to the scene or using any additional measurements like RGBD cameras. Our key insight is that, although depth estimation from a single image is notoriously difficult, we can conveniently obtain a relatively accurate normal map, which essentially conveys a great deal of scene geometry. This enables us to model each object in a data-driven manner by representing the object as a normal-based graph and retrieving a similar model from the database by graph matching. Moreover, edge information is integrated to further improve the searching result. We hypothesize a set of sparse surface orientations for the image and further refine them in an intuitive and straightforward manner. With a small amount of simple user interaction, our approach is able to generate a plausible model of the scene. To verify the effectiveness of our proposed method, we show the modeling results on a variety of indoor images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Article 03 July 2015

Top–Down Bayesian Inference of Indoor Scenes

Geometry Driven Semantic Labeling of Indoor Scenes

Notes

References

Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3d chairs: exemplar part-based 2d–3d alignment using a large dataset of cad models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3762–3769 (2014)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Article Google Scholar
Chen, K., Lai, Y.K., Wu, Y.X., Martin, R., Hu, S.M.: Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graph. (TOG) 33(6), 208 (2014)
Article Google Scholar
Cheng, M.M., Zheng, S., Lin, W.Y., Vineet, V., Sturgess, P., Crook, N., Mitra, N.J., Torr, P.: Imagespirit: verbal guided image parsing. ACM Trans. Graph. (TOG) 34(1), 3 (2014)
Article Google Scholar
Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)
Article MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Delage, E., Lee, H., Ng, A.Y.: Automatic single-image 3d reconstructions of indoor manhattan world scenes. In: Thrun, S., Brooks, R., Durrant-Whyte, H. (eds.) Robotics Research, pp. 305–321. Springer, Berlin, Heidelberg (2007)
Chapter Google Scholar
Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graph. (TOG) 30(4), 34 (2011)
Article Google Scholar
Guillou, E., Meneveaux, D., Maisel, E., Bouatouch, K.: Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Vis. Comput. 16(7), 396–410 (2000)
Article MATH Google Scholar
Guo, Y., Zhang, G., Lan, Z., Wang, W.: Efficient view manipulation for cuboid-structured images. Comput. Graph. 38, 174–182 (2014)
Article Google Scholar
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: Rgb-d mapping: using depth cameras for dense 3d modeling of indoor environments. In: In the 12th International Symposium on Experimental Robotics (ISER). Citeseer (2010)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. (TOG) 24(3), 577–584 (2005)
Article Google Scholar
Horn, B.K., Brooks, M.J.: Shape Shading. MIT press, Cambridge (1989)
MATH Google Scholar
Hou, F., Qin, H., Qi, Y.: Procedure-based component and architecture modeling from a single image. Vis. Comput. 32(2), 151–166 (2016)
Article Google Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Karsch, K., Sunkavalli, K., Hadap, S., Carr, N., Jin, H., Fonte, R., Sittig, M., Forsyth, D.: Automatic scene inference for 3D object compositing. ACM Trans. Graph. (TOG) 33(3), 32 (2014)
Article MATH Google Scholar
Kim, Y.M., Mitra, N.J., Yan, D.M., Guibas, L.: Acquiring 3D indoor environments with variability and repetition. ACM Trans. Graph. (TOG) 31(6), 138 (2012)
Article Google Scholar
Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 150–162 (1994)
Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems, pp. 1288–1296 (2010)
Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2136–2143. IEEE (2009)
Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Trans. Graph. (ToG) 23(3), 303–308 (2004)
Article Google Scholar
Liu, M., Guo, Y., Wang, J.: Normal guided data-driven semantic modeling from a single indoor image. In: International Conference on Cyberworlds (2016)
Nan, L., Xie, K., Sharf, A.: A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. (ToG) 31(6), 137 (2012)
Article Google Scholar
Nguyen, H.M., Wünsche, B., Delmas, P., Lutteroth, C., Zhang, E.: A robust hybrid image-based modeling system. Vis. Comput. 32(5), 625–640 (2016)
Article Google Scholar
Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 433–442. ACM (2001)
Saxena, A., Chung, S.H., Ng, A.Y.: 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 76(1), 53–69 (2008)
Article Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Article Google Scholar
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. (TOG) 31(6), 136 (2012)
Article Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision-ECCV 2012, pp. 746–760. Springer, Berlin, Heidelberg (2012)
Chapter Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. (TOG) 25(3), 835–846 (2006)
Article Google Scholar
Su, H., Huang, Q., Mitra, N.J., Li, Y., Guibas, L.: Estimating image depth using shape collections. ACM Trans. Graph. (TOG) 33(4), 37 (2014)
Google Scholar
Tardif, J.P.: Non-iterative approach for fast and accurate vanishing point detection. In: IEEE 12th International Conference on Computer Vision, 2009, pp. 1250–1257. IEEE (2009)
von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: Lsd: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)
Article Google Scholar
Wang, C., Guo, Y., Zhu, J., Wang, L., Wang, W.: Video object co-segmentation via subspace clustering and quadratic pseudo-boolean optimization in an mrf framework. IEEE Trans. Multimed. 16(4), 903–916 (2014)
Article Google Scholar
Xiao, J., Russell, B., Torralba, A.: Localizing 3D cuboids in single-view images. In: Advances in Neural Information Processing Systems, pp. 746–754 (2012)
Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. 31(4), 99:1–99:11 (2012)
Google Scholar
Zhou, F., De la Torre, F.: Factorized graph matching. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 127–134. IEEE (2012)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: ECCV (2014)

Download references

Acknowledgements

The authors would like to thank the reviewers for their constructive comments which helped improve this paper greatly. This work was supported in part by the National Natural Science Foundation of China under Grants 61373059, 61672279, and 61321491 and the Natural Science Foundation of Jiangsu Province under Grants BK20150016.

Author information

Authors and Affiliations

State Key Lab for Novel Software Technology, Nanjing University, Nanjing, People’s Republic of China
Mingming Liu & Yanwen Guo
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People’s Republic of China
Jun Wang

Authors

Mingming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanwen Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanwen Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, M., Guo, Y. & Wang, J. Indoor scene modeling from a single image using normal inference and edge features. Vis Comput 33, 1227–1240 (2017). https://doi.org/10.1007/s00371-016-1348-3

Download citation

Published: 11 January 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00371-016-1348-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indoor scene modeling from a single image using normal inference and edge features

Abstract

Access this article

Similar content being viewed by others

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Top–Down Bayesian Inference of Indoor Scenes

Geometry Driven Semantic Labeling of Indoor Scenes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indoor scene modeling from a single image using normal inference and edge features

Abstract

Access this article

Similar content being viewed by others

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Top–Down Bayesian Inference of Indoor Scenes

Geometry Driven Semantic Labeling of Indoor Scenes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation