Skip to main content
Log in

Volumetric structure extraction in a single image

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

High-level structure (HLS) extraction recovers 3D elements on human-made surfaces (objects, buildings, ground, etc.). There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In general, 3D point cloud and multiple views approaches have good performance for certain scenes with video sequences or image sequences, but they need sufficient parallax in order to guarantee accuracy. To address this problem, an alternative is to process a single RGB image seeking to interpret areas of the images where the human-made structure may be observed, thus removing parallax dependency, but adding the challenge of having to interpret image ambiguities correctly. Motivated by the latter, we propose a methodology for 3D volumetric structure extraction from a single image. Our strategy is to divide and simplify the 3D structure extraction process. For that, our methodology has three steps. First, the structure recognition step provides the segmentation, location, and delimitation of the urbanized structures in the scene. Second, we propose a graph analysis to classify and locate the boundaries between the different urbanized structures in the scene. Third, we use a proposed CNN and the pinhole camera model to extract the 3D volumetric structure. On the other hand, we evaluate this methodology in synthetic and public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Parallax is defined as the angle obtained by the objects displacement from an image sequence.

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels. EPFL (2010) .http://www.kev-smith.com/papers/SLIC_Superpixels.pdf

  2. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. (2020). https://doi.org/10.1007/s00371-019-01630-9

    Article  Google Scholar 

  3. Aguilar-González, A., Arias-Estrada, M., Berry, F., de Jesús Osuna-Coutiño, J.A.: The fastest visual ego-motion algorithm in the west. Microprocess. Microsyst. 67, 103–116 (2019). https://doi.org/10.1016/j.micpro.2019.03.005

    Article  Google Scholar 

  4. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv e-prints arXiv:1812.11941 (2019)

  5. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  6. Bartoli, A., Sturm, P.: Constrained structure and motion from multiple uncalibrated views of a piecewise planar scene. Int. J. Comput. Vis. (IJCV) 52(1), 45–64 (2003). https://doi.org/10.1023/A:1022318524906

    Article  MATH  Google Scholar 

  7. Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3D object detection. Comput. Vis. Pattern Recognit. (2019). arXiv:1904.08601

  8. Chen, Y.T., Garbade, M., Gall, J.: 3d semantic scene completion from a single depth image using adversarial training. Comput. Vis. Pattern Recognit. (2019). arXiv:1905.06231

  9. Cherian, A., Morellas, V., Papanikolopoulos, N.: Accurate 3d ground plane estimation from a single image. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2243–2249 (2009). https://doi.org/10.1109/ROBOT.2009.5152260

  10. Dani, A., Panahandeh, G., Chung, S.J., Hutchinson, S.: Image moments for higher-level feature based navigation. In: IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 602–609 (2013). https://doi.org/10.1109/IROS.2013.6696413

  11. de Jesus Osuna-Coutino, J.A., Cruz-Martinez, C, Martinez-Carranza J, Arias-Estrada, M, Mayol-Cuevas, W.: Dominant plane recognition in interior scenes from a single image. In: IEEE International Conference on Pattern Recognition (ICPR), pp. 1923–1928 (2016). https://doi.org/10.1109/ICPR.2016.7899917

  12. de Jesus Osuna-Coutino, J.A., Cruz-Martinez, C, Martinez-Carranza J, Arias-Estrada, M, Mayol-Cuevas, W.: I want to change my floor: Dominant plane recognition from a single image to augment the scene. In: IEEE International Symposium on Mixed and Augmented Reality Adjunct Proceedings, pp. 135–140 (2016) https://doi.org/10.1109/ISMAR-Adjunct.2016.0060

  13. Deng, L., Yu, D.: Deep learning: methods and applications. Now Found. Trends (2014). https://doi.org/10.1561/2000000039

    Article  MATH  Google Scholar 

  14. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: International Conference on Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)

  15. Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge a retrospective. Springer Int. J. Comput. Vis. 111(1), 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

  16. Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–12 (2017). https://doi.org/10.1109/CVPR.2017.264

  17. Favaro, P., Soatto, S.: A geometric approach to shape from defocus. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 406–417 (2005). https://doi.org/10.1109/TPAMI.2005.43

    Article  Google Scholar 

  18. Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5431–5440 (2016). https://doi.org/10.1109/CVPR.2016.586

  19. Forsyth, D.A.: Shape from texture and integrability. In: Proceedings IEEE International Conference on Computer Vision. (ICCV) (2001). https://doi.org/10.1109/ICCV.2001.937659

  20. Fouhey, D.F., Gupta, A., Hebert, M.: Data-driven 3d primitives for single image understanding. IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCV.2013.421

    Article  Google Scholar 

  21. Fredj, H.B., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with cnn. Visual Comput. (2020). https://doi.org/10.1007/s00371-020-01794-9

    Article  Google Scholar 

  22. Gee, A.P., Chekhlov, D., Mayol-Cuevas, W., Calway, A.: Discovering planes and collapsing the state space in visual slam. Br. Mach. Vis. Conf., pp. 6–12 (2007). https://doi.org/10.5244/C.21.6

  23. Gee, A.P., Chekhlov, D., Calway, A., Mayol-Cuevas, W.: Discovering higher level structure in visual slam. IEEE Trans. Robot. 24(5), 980–990 (2008). https://doi.org/10.1109/TRO.2008.2004641

    Article  Google Scholar 

  24. Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2015). https://doi.org/10.1109/TPAMI.2014.2382097

    Article  Google Scholar 

  25. Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: IEEE International Conference on Computer Vision (ICCV), pp. 654–661 (2005). https://doi.org/10.1109/ICCV.2005.107

  26. Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Springer Int. J. Comput. Vis. 75(1), 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y

    Article  MATH  Google Scholar 

  27. Howard, A., Koenig, N.: Gazebo Simulator. University of Southern California (2018). http://gazebosim.org

  28. Howard, A., Koenig, N.: Gazebo. University of Southern California (2018). http://gazebosim.org

  29. Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2017.243

    Article  Google Scholar 

  30. Jiajun, W., Yifan, W., Tianfan, X., Xingyuan, S., Bill, F., Josh, T.: Marrnet: 3D shape reconstruction via 2.5d sketches. In: ACM International Conference on Neural Information Processing Systems (NIPS), pp. 540–550 (2017). https://dl.acm.org/citation.cfm?id=3294823

  31. Kosecká, J., Zhang, W.: Extraction, matching, and pose recovery based on dominant rectangular structures. Elsevier Comput. Vis. Image Underst. 100(3), 274–293 (2005). https://doi.org/10.1016/j.cviu.2005.04.005

    Article  Google Scholar 

  32. Lavest, J., Rives, G., Dhome, M.: Three-dimensional reconstruction by zooming. IEEE Trans. Robot. Autom. 9(2), 196–207 (1993). https://doi.org/10.1109/70.238283

    Article  Google Scholar 

  33. Li, W., Song, D.: Toward featureless visual navigation: Simultaneous localization and planar surface extraction using motion vectors in video streams. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 9–14 (2014). https://doi.org/10.1109/ICRA.2014.6906583

  34. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Springer European Conference on Computer Vision, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  35. Liu, J.: Pangocloud. Github (2017). https://github.com/stevenlovegrove/Pangolin

  36. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016). https://doi.org/10.1109/TPAMI.2015.2505283

    Article  Google Scholar 

  37. Liu, D., Liu, X., Wu, Y.: Depth reconstruction from single images using a convolutional neural network and a condition random field model. MDPI Sens. (2018). https://doi.org/10.3390/s18051318

    Article  Google Scholar 

  38. Lobay, A., Forsyth, D.A.: Shape from texture without boundaries. Springer Int. J. Comput. Vis. 67(1), 71–91 (2006). https://doi.org/10.1007/s11263-006-4068-8

    Article  Google Scholar 

  39. Martinez-Carranza, J., Calway, A.: Unifying planar and point mapping in monocular slam. In: BMVC, pp. 1–11 (2010)

  40. Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3471–3478 (2015). https://doi.org/10.1109/ICRA.2015.7139679

  41. Mazzini, D., Buzzelli, M., Pauy, D.P., Schettini, R.: A CNN architecture for efficient semantic segmentation of street scenes. In: IEEE Conference on Consumer Electronics (ICCE) (2018). https://doi.org/10.1109/ICCE-Berlin.2018.8576193

  42. McClean, E., Cao, Y., McDonald, J.: Single image augmented reality using planar structures in urban environments. In: IEEE Conference on Irish Machine Vision and Image Processing, pp. 1–6 (2011). https://doi.org/10.1109/IMVIP.2011.10

  43. Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: ACM International Conference on Machine Learning August 07, pp. 593–600 (2005). https://doi.org/10.1145/1102351.1102426

  44. Micusik, B., Wildenauer, H., Kosecka, J.: Detection and matching of rectilinear structures. In: EEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2008). https://doi.org/10.1109/CVPR.2008.4587488

  45. Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: Structure extraction in urbanized aerial images from a single view using a CNN-based approach. Int. Remote Sens. (2020). https://doi.org/10.1080/01431161.2020.1767821

    Article  Google Scholar 

  46. Osuna-Coutiño, J.A.d.J., Martinez-Carranza, J.: A binary descriptor invariant to rotation and robust to noise (BIRRN) for floor recognition. IN: Mexican Conference on Pattern Recognition (2019)

  47. Osuna-Coutiño, J.A.d.J., Martinez-Carranza, J.: Binary-patterns based floor recognition suitable for urban scenes. In: IEEE International Conference on Control, Decision and Information Technologies (CoDIT) (2019) https://doi.org/10.1109/CoDIT.2019.8820296

  48. Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: High level 3d structure extraction from a single image using a CNN-based approach. Sensors (2019). https://doi.org/10.3390/s19030563

    Article  Google Scholar 

  49. Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: Structure extraction in urbanized aerial images from a single view using a CNN-based approach. J. Remote Sens., Int (2020). https://doi.org/10.1080/01431161.2020.1767821

    Book  Google Scholar 

  50. Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 283–291 (2018) https://doi.org/10.1109/CVPR.2018.00037

  51. Rahimi, A., Moradi, H., Zoroofi, R.A.: Single image ground plane estimation. In: IEEE International Conference on Image Processing, pp. 2149–2153 (2013). https://doi.org/10.1109/ICIP.2013.6738443

  52. Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://github.com/jason718/game-feature-learning

  53. Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 762–771 (2018). https://doi.org/10.1109/CVPR.2018.00086

  54. Ruder, S.: An overview of gradient descent optimization algorithms. Mach. Learn. (2016). arXiv:1609.04747v2

  55. Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: ACM Conference: Advances in Neural Information Processing Systems (NIPS), pp. 1161–1168 (2005). http://make3d.cs.cornell.edu/data.html#make3d. Make3D Dataset

  56. Saxena, A., Sun, M., Ng, A.Y.: Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), pp. 824–840 (2008). https://doi.org/10.1109/TPAMI.2008.132http://make3d.cs.cornell.edu/data.html#make3d. Make3D Dataset

  57. Shimodaira, H.: A shape-from-shading method of polyhedral objects using prior information. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 612–624 (2006). https://doi.org/10.1109/TPAMI.2006.67

    Article  Google Scholar 

  58. Silveira, G., Malis, E., Rives, P.: Real-time robust detection of planar regions in a pair of images. In: International Conference on Intelligent Robots and Systems (IROS), pp. 49–54 (2006) .https://doi.org/10.1109/IROS.2006.282189

  59. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015). arXiv:1409.1556

  60. Sucar, L.E.: Probabilistic graphical models: principles and applications. Springer Adv. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1007/978-1-4471-6699-3

    Article  MATH  Google Scholar 

  61. Teng, C.H., Chuo, K.Y., Hsieh, C.Y.: Reconstructing three-dimensional models of objects using a Kinect sensor. Springer Visual Comput. 34(11), 1507–1523 (2017). https://doi.org/10.1007/s00371-017-1425-2

    Article  Google Scholar 

  62. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: International Conference on 3D Vision (3DV) (2017). https://doi.org/10.1109/3DV.2017.00012http://www.cvlibs.net/datasets/kitti. KITTI Dataset

  63. Vosselman, G., Dijkman, S.: 3d building model reconstruction from point clouds and ground plans. In: International Society for Photogrammetry and Remote Sensing (ISPRS), pp. 37–43 (2001)

  64. Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., Shen, C.: Task-aware monocular depth estimation for 3d object detection. Comput. Vis. Pattern Recognit. (2019). arXiv:1909.07701

  65. Wilczkowiak, M., Sturm, P., Boyer, E.: Using geometric constraints through parallelepipeds for calibration and 3d modeling. IEEE Trans. Pattern Anal. Mach. Intell. (2005). https://doi.org/10.1109/TPAMI.2005.40

    Article  Google Scholar 

  66. Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Comput. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc

  67. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. In: Conference: Robotics: Science and Systems (2018) https://doi.org/10.15607/RSS.2018.XIV.019

  68. Yang, D., Deng, J.: Shape from shading through shape evolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. (pp. 3781–3790) (2018) https://doi.org/10.1109/CVPR.2018.00398

  69. Zhao, S., Fang, Z.: Direct depth slam: sparse geometric feature enhanced direct depth slam system for low-texture environments. Sensors (2018). https://doi.org/10.3390/s18103339

    Article  Google Scholar 

  70. Zhuo, W., Salzmann, M., He, X., Liu, M.: Indoor scene structure analysis for single image depth estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 614–622 (2015). https://doi.org/10.1109/CVPR.2015.7298660

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Martinez-Carranza.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Jesús Osuna-Coutiño, J.A., Martinez-Carranza, J. Volumetric structure extraction in a single image. Vis Comput 38, 2899–2921 (2022). https://doi.org/10.1007/s00371-021-02163-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02163-w

Keywords

Navigation