Skip to main content
Log in

Indoor scene modeling from a single image using normal inference and edge features

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present in this paper an interactive approach for semantically modeling the indoor environment given only a single indoor image as input, without requiring access to the scene or using any additional measurements like RGBD cameras. Our key insight is that, although depth estimation from a single image is notoriously difficult, we can conveniently obtain a relatively accurate normal map, which essentially conveys a great deal of scene geometry. This enables us to model each object in a data-driven manner by representing the object as a normal-based graph and retrieving a similar model from the database by graph matching. Moreover, edge information is integrated to further improve the searching result. We hypothesize a set of sparse surface orientations for the image and further refine them in an intuitive and straightforward manner. With a small amount of simple user interaction, our approach is able to generate a plausible model of the scene. To verify the effectiveness of our proposed method, we show the modeling results on a variety of indoor images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://code.google.com/p/vpdetection/.

  2. http://vision.cs.uiuc.edu/~vhedau2/Research/data/groundtruth.zip.

References

  1. Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3d chairs: exemplar part-based 2d–3d alignment using a large dataset of cad models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3762–3769 (2014)

  2. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  3. Chen, K., Lai, Y.K., Wu, Y.X., Martin, R., Hu, S.M.: Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graph. (TOG) 33(6), 208 (2014)

    Article  Google Scholar 

  4. Cheng, M.M., Zheng, S., Lin, W.Y., Vineet, V., Sturgess, P., Crook, N., Mitra, N.J., Torr, P.: Imagespirit: verbal guided image parsing. ACM Trans. Graph. (TOG) 34(1), 3 (2014)

    Article  Google Scholar 

  5. Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)

    Article  MATH  Google Scholar 

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)

  7. Delage, E., Lee, H., Ng, A.Y.: Automatic single-image 3d reconstructions of indoor manhattan world scenes. In: Thrun, S., Brooks, R., Durrant-Whyte, H. (eds.) Robotics Research, pp. 305–321. Springer, Berlin, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graph. (TOG) 30(4), 34 (2011)

    Article  Google Scholar 

  9. Guillou, E., Meneveaux, D., Maisel, E., Bouatouch, K.: Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Vis. Comput. 16(7), 396–410 (2000)

    Article  MATH  Google Scholar 

  10. Guo, Y., Zhang, G., Lan, Z., Wang, W.: Efficient view manipulation for cuboid-structured images. Comput. Graph. 38, 174–182 (2014)

    Article  Google Scholar 

  11. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: Rgb-d mapping: using depth cameras for dense 3d modeling of indoor environments. In: In the 12th International Symposium on Experimental Robotics (ISER). Citeseer (2010)

  12. Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. (TOG) 24(3), 577–584 (2005)

    Article  Google Scholar 

  13. Horn, B.K., Brooks, M.J.: Shape Shading. MIT press, Cambridge (1989)

    MATH  Google Scholar 

  14. Hou, F., Qin, H., Qi, Y.: Procedure-based component and architecture modeling from a single image. Vis. Comput. 32(2), 151–166 (2016)

    Article  Google Scholar 

  15. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)

  16. Karsch, K., Sunkavalli, K., Hadap, S., Carr, N., Jin, H., Fonte, R., Sittig, M., Forsyth, D.: Automatic scene inference for 3D object compositing. ACM Trans. Graph. (TOG) 33(3), 32 (2014)

    Article  MATH  Google Scholar 

  17. Kim, Y.M., Mitra, N.J., Yan, D.M., Guibas, L.: Acquiring 3D indoor environments with variability and repetition. ACM Trans. Graph. (TOG) 31(6), 138 (2012)

    Article  Google Scholar 

  18. Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 150–162 (1994)

  19. Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems, pp. 1288–1296 (2010)

  20. Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2136–2143. IEEE (2009)

  21. Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Trans. Graph. (ToG) 23(3), 303–308 (2004)

    Article  Google Scholar 

  22. Liu, M., Guo, Y., Wang, J.: Normal guided data-driven semantic modeling from a single indoor image. In: International Conference on Cyberworlds (2016)

  23. Nan, L., Xie, K., Sharf, A.: A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. (ToG) 31(6), 137 (2012)

    Article  Google Scholar 

  24. Nguyen, H.M., Wünsche, B., Delmas, P., Lutteroth, C., Zhang, E.: A robust hybrid image-based modeling system. Vis. Comput. 32(5), 625–640 (2016)

    Article  Google Scholar 

  25. Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 433–442. ACM (2001)

  26. Saxena, A., Chung, S.H., Ng, A.Y.: 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 76(1), 53–69 (2008)

    Article  Google Scholar 

  27. Saxena, A., Sun, M., Ng, A.Y.: Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)

    Article  Google Scholar 

  28. Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. (TOG) 31(6), 136 (2012)

    Article  Google Scholar 

  29. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision-ECCV 2012, pp. 746–760. Springer, Berlin, Heidelberg (2012)

    Chapter  Google Scholar 

  30. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. (TOG) 25(3), 835–846 (2006)

    Article  Google Scholar 

  31. Su, H., Huang, Q., Mitra, N.J., Li, Y., Guibas, L.: Estimating image depth using shape collections. ACM Trans. Graph. (TOG) 33(4), 37 (2014)

    Google Scholar 

  32. Tardif, J.P.: Non-iterative approach for fast and accurate vanishing point detection. In: IEEE 12th International Conference on Computer Vision, 2009, pp. 1250–1257. IEEE (2009)

  33. von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: Lsd: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)

    Article  Google Scholar 

  34. Wang, C., Guo, Y., Zhu, J., Wang, L., Wang, W.: Video object co-segmentation via subspace clustering and quadratic pseudo-boolean optimization in an mrf framework. IEEE Trans. Multimed. 16(4), 903–916 (2014)

    Article  Google Scholar 

  35. Xiao, J., Russell, B., Torralba, A.: Localizing 3D cuboids in single-view images. In: Advances in Neural Information Processing Systems, pp. 746–754 (2012)

  36. Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. 31(4), 99:1–99:11 (2012)

    Google Scholar 

  37. Zhou, F., De la Torre, F.: Factorized graph matching. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 127–134. IEEE (2012)

  38. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: ECCV (2014)

Download references

Acknowledgements

The authors would like to thank the reviewers for their constructive comments which helped improve this paper greatly. This work was supported in part by the National Natural Science Foundation of China under Grants 61373059, 61672279, and 61321491 and the Natural Science Foundation of Jiangsu Province under Grants BK20150016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanwen Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Guo, Y. & Wang, J. Indoor scene modeling from a single image using normal inference and edge features. Vis Comput 33, 1227–1240 (2017). https://doi.org/10.1007/s00371-016-1348-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-016-1348-3

Keywords

Navigation