Volumetric structure extraction in a single image

de Jesús Osuna-Coutiño, J. A.; Martinez-Carranza, Jose

doi:10.1007/s00371-021-02163-w

Volumetric structure extraction in a single image

Original article
Published: 21 May 2021

Volume 38, pages 2899–2921, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

J. A. de Jesús Osuna-Coutiño¹ &
Jose Martinez-Carranza ORCID: orcid.org/0000-0002-8914-1904^1,2

277 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

High-level structure (HLS) extraction recovers 3D elements on human-made surfaces (objects, buildings, ground, etc.). There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In general, 3D point cloud and multiple views approaches have good performance for certain scenes with video sequences or image sequences, but they need sufficient parallax in order to guarantee accuracy. To address this problem, an alternative is to process a single RGB image seeking to interpret areas of the images where the human-made structure may be observed, thus removing parallax dependency, but adding the challenge of having to interpret image ambiguities correctly. Motivated by the latter, we propose a methodology for 3D volumetric structure extraction from a single image. Our strategy is to divide and simplify the 3D structure extraction process. For that, our methodology has three steps. First, the structure recognition step provides the segmentation, location, and delimitation of the urbanized structures in the scene. Second, we propose a graph analysis to classify and locate the boundaries between the different urbanized structures in the scene. Third, we use a proposed CNN and the pinhole camera model to extract the 3D volumetric structure. On the other hand, we evaluate this methodology in synthetic and public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Segmentation in Point Clouds from RGB-D Using Spectral Graph Reduction

Multi-view 3D Objects Localization from Street-Level Scenes

High level structure recognition in single urban images using a CNN and SuperPixels

Article 11 February 2023

Notes

Parallax is defined as the angle obtained by the objects displacement from an image sequence.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels. EPFL (2010) .http://www.kev-smith.com/papers/SLIC_Superpixels.pdf
Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. (2020). https://doi.org/10.1007/s00371-019-01630-9
Article Google Scholar
Aguilar-González, A., Arias-Estrada, M., Berry, F., de Jesús Osuna-Coutiño, J.A.: The fastest visual ego-motion algorithm in the west. Microprocess. Microsyst. 67, 103–116 (2019). https://doi.org/10.1016/j.micpro.2019.03.005
Article Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv e-prints arXiv:1812.11941 (2019)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Bartoli, A., Sturm, P.: Constrained structure and motion from multiple uncalibrated views of a piecewise planar scene. Int. J. Comput. Vis. (IJCV) 52(1), 45–64 (2003). https://doi.org/10.1023/A:1022318524906
Article MATH Google Scholar
Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3D object detection. Comput. Vis. Pattern Recognit. (2019). arXiv:1904.08601
Chen, Y.T., Garbade, M., Gall, J.: 3d semantic scene completion from a single depth image using adversarial training. Comput. Vis. Pattern Recognit. (2019). arXiv:1905.06231
Cherian, A., Morellas, V., Papanikolopoulos, N.: Accurate 3d ground plane estimation from a single image. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2243–2249 (2009). https://doi.org/10.1109/ROBOT.2009.5152260
Dani, A., Panahandeh, G., Chung, S.J., Hutchinson, S.: Image moments for higher-level feature based navigation. In: IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 602–609 (2013). https://doi.org/10.1109/IROS.2013.6696413
de Jesus Osuna-Coutino, J.A., Cruz-Martinez, C, Martinez-Carranza J, Arias-Estrada, M, Mayol-Cuevas, W.: Dominant plane recognition in interior scenes from a single image. In: IEEE International Conference on Pattern Recognition (ICPR), pp. 1923–1928 (2016). https://doi.org/10.1109/ICPR.2016.7899917
de Jesus Osuna-Coutino, J.A., Cruz-Martinez, C, Martinez-Carranza J, Arias-Estrada, M, Mayol-Cuevas, W.: I want to change my floor: Dominant plane recognition from a single image to augment the scene. In: IEEE International Symposium on Mixed and Augmented Reality Adjunct Proceedings, pp. 135–140 (2016) https://doi.org/10.1109/ISMAR-Adjunct.2016.0060
Deng, L., Yu, D.: Deep learning: methods and applications. Now Found. Trends (2014). https://doi.org/10.1561/2000000039
Article MATH Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: International Conference on Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge a retrospective. Springer Int. J. Comput. Vis. 111(1), 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–12 (2017). https://doi.org/10.1109/CVPR.2017.264
Favaro, P., Soatto, S.: A geometric approach to shape from defocus. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 406–417 (2005). https://doi.org/10.1109/TPAMI.2005.43
Article Google Scholar
Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5431–5440 (2016). https://doi.org/10.1109/CVPR.2016.586
Forsyth, D.A.: Shape from texture and integrability. In: Proceedings IEEE International Conference on Computer Vision. (ICCV) (2001). https://doi.org/10.1109/ICCV.2001.937659
Fouhey, D.F., Gupta, A., Hebert, M.: Data-driven 3d primitives for single image understanding. IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCV.2013.421
Article Google Scholar
Fredj, H.B., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with cnn. Visual Comput. (2020). https://doi.org/10.1007/s00371-020-01794-9
Article Google Scholar
Gee, A.P., Chekhlov, D., Mayol-Cuevas, W., Calway, A.: Discovering planes and collapsing the state space in visual slam. Br. Mach. Vis. Conf., pp. 6–12 (2007). https://doi.org/10.5244/C.21.6
Gee, A.P., Chekhlov, D., Calway, A., Mayol-Cuevas, W.: Discovering higher level structure in visual slam. IEEE Trans. Robot. 24(5), 980–990 (2008). https://doi.org/10.1109/TRO.2008.2004641
Article Google Scholar
Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2015). https://doi.org/10.1109/TPAMI.2014.2382097
Article Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: IEEE International Conference on Computer Vision (ICCV), pp. 654–661 (2005). https://doi.org/10.1109/ICCV.2005.107
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Springer Int. J. Comput. Vis. 75(1), 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y
Article MATH Google Scholar
Howard, A., Koenig, N.: Gazebo Simulator. University of Southern California (2018). http://gazebosim.org
Howard, A., Koenig, N.: Gazebo. University of Southern California (2018). http://gazebosim.org
Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2017.243
Article Google Scholar
Jiajun, W., Yifan, W., Tianfan, X., Xingyuan, S., Bill, F., Josh, T.: Marrnet: 3D shape reconstruction via 2.5d sketches. In: ACM International Conference on Neural Information Processing Systems (NIPS), pp. 540–550 (2017). https://dl.acm.org/citation.cfm?id=3294823
Kosecká, J., Zhang, W.: Extraction, matching, and pose recovery based on dominant rectangular structures. Elsevier Comput. Vis. Image Underst. 100(3), 274–293 (2005). https://doi.org/10.1016/j.cviu.2005.04.005
Article Google Scholar
Lavest, J., Rives, G., Dhome, M.: Three-dimensional reconstruction by zooming. IEEE Trans. Robot. Autom. 9(2), 196–207 (1993). https://doi.org/10.1109/70.238283
Article Google Scholar
Li, W., Song, D.: Toward featureless visual navigation: Simultaneous localization and planar surface extraction using motion vectors in video streams. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 9–14 (2014). https://doi.org/10.1109/ICRA.2014.6906583
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Springer European Conference on Computer Vision, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, J.: Pangocloud. Github (2017). https://github.com/stevenlovegrove/Pangolin
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016). https://doi.org/10.1109/TPAMI.2015.2505283
Article Google Scholar
Liu, D., Liu, X., Wu, Y.: Depth reconstruction from single images using a convolutional neural network and a condition random field model. MDPI Sens. (2018). https://doi.org/10.3390/s18051318
Article Google Scholar
Lobay, A., Forsyth, D.A.: Shape from texture without boundaries. Springer Int. J. Comput. Vis. 67(1), 71–91 (2006). https://doi.org/10.1007/s11263-006-4068-8
Article Google Scholar
Martinez-Carranza, J., Calway, A.: Unifying planar and point mapping in monocular slam. In: BMVC, pp. 1–11 (2010)
Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3471–3478 (2015). https://doi.org/10.1109/ICRA.2015.7139679
Mazzini, D., Buzzelli, M., Pauy, D.P., Schettini, R.: A CNN architecture for efficient semantic segmentation of street scenes. In: IEEE Conference on Consumer Electronics (ICCE) (2018). https://doi.org/10.1109/ICCE-Berlin.2018.8576193
McClean, E., Cao, Y., McDonald, J.: Single image augmented reality using planar structures in urban environments. In: IEEE Conference on Irish Machine Vision and Image Processing, pp. 1–6 (2011). https://doi.org/10.1109/IMVIP.2011.10
Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: ACM International Conference on Machine Learning August 07, pp. 593–600 (2005). https://doi.org/10.1145/1102351.1102426
Micusik, B., Wildenauer, H., Kosecka, J.: Detection and matching of rectilinear structures. In: EEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2008). https://doi.org/10.1109/CVPR.2008.4587488
Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: Structure extraction in urbanized aerial images from a single view using a CNN-based approach. Int. Remote Sens. (2020). https://doi.org/10.1080/01431161.2020.1767821
Article Google Scholar
Osuna-Coutiño, J.A.d.J., Martinez-Carranza, J.: A binary descriptor invariant to rotation and robust to noise (BIRRN) for floor recognition. IN: Mexican Conference on Pattern Recognition (2019)
Osuna-Coutiño, J.A.d.J., Martinez-Carranza, J.: Binary-patterns based floor recognition suitable for urban scenes. In: IEEE International Conference on Control, Decision and Information Technologies (CoDIT) (2019) https://doi.org/10.1109/CoDIT.2019.8820296
Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: High level 3d structure extraction from a single image using a CNN-based approach. Sensors (2019). https://doi.org/10.3390/s19030563
Article Google Scholar
Osuna-Coutiño, JAdJ, Martinez-Carranza, J.: Structure extraction in urbanized aerial images from a single view using a CNN-based approach. J. Remote Sens., Int (2020). https://doi.org/10.1080/01431161.2020.1767821
Book Google Scholar
Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 283–291 (2018) https://doi.org/10.1109/CVPR.2018.00037
Rahimi, A., Moradi, H., Zoroofi, R.A.: Single image ground plane estimation. In: IEEE International Conference on Image Processing, pp. 2149–2153 (2013). https://doi.org/10.1109/ICIP.2013.6738443
Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://github.com/jason718/game-feature-learning
Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 762–771 (2018). https://doi.org/10.1109/CVPR.2018.00086
Ruder, S.: An overview of gradient descent optimization algorithms. Mach. Learn. (2016). arXiv:1609.04747v2
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: ACM Conference: Advances in Neural Information Processing Systems (NIPS), pp. 1161–1168 (2005). http://make3d.cs.cornell.edu/data.html#make3d. Make3D Dataset
Saxena, A., Sun, M., Ng, A.Y.: Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), pp. 824–840 (2008). https://doi.org/10.1109/TPAMI.2008.132 http://make3d.cs.cornell.edu/data.html#make3d. Make3D Dataset
Shimodaira, H.: A shape-from-shading method of polyhedral objects using prior information. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 612–624 (2006). https://doi.org/10.1109/TPAMI.2006.67
Article Google Scholar
Silveira, G., Malis, E., Rives, P.: Real-time robust detection of planar regions in a pair of images. In: International Conference on Intelligent Robots and Systems (IROS), pp. 49–54 (2006) .https://doi.org/10.1109/IROS.2006.282189
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015). arXiv:1409.1556
Sucar, L.E.: Probabilistic graphical models: principles and applications. Springer Adv. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1007/978-1-4471-6699-3
Article MATH Google Scholar
Teng, C.H., Chuo, K.Y., Hsieh, C.Y.: Reconstructing three-dimensional models of objects using a Kinect sensor. Springer Visual Comput. 34(11), 1507–1523 (2017). https://doi.org/10.1007/s00371-017-1425-2
Article Google Scholar
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: International Conference on 3D Vision (3DV) (2017). https://doi.org/10.1109/3DV.2017.00012 http://www.cvlibs.net/datasets/kitti. KITTI Dataset
Vosselman, G., Dijkman, S.: 3d building model reconstruction from point clouds and ground plans. In: International Society for Photogrammetry and Remote Sensing (ISPRS), pp. 37–43 (2001)
Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., Shen, C.: Task-aware monocular depth estimation for 3d object detection. Comput. Vis. Pattern Recognit. (2019). arXiv:1909.07701
Wilczkowiak, M., Sturm, P., Boyer, E.: Using geometric constraints through parallelepipeds for calibration and 3d modeling. IEEE Trans. Pattern Anal. Mach. Intell. (2005). https://doi.org/10.1109/TPAMI.2005.40
Article Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Comput. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. In: Conference: Robotics: Science and Systems (2018) https://doi.org/10.15607/RSS.2018.XIV.019
Yang, D., Deng, J.: Shape from shading through shape evolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. (pp. 3781–3790) (2018) https://doi.org/10.1109/CVPR.2018.00398
Zhao, S., Fang, Z.: Direct depth slam: sparse geometric feature enhanced direct depth slam system for low-texture environments. Sensors (2018). https://doi.org/10.3390/s18103339
Article Google Scholar
Zhuo, W., Salzmann, M., He, X., Liu, M.: Indoor scene structure analysis for single image depth estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 614–622 (2015). https://doi.org/10.1109/CVPR.2015.7298660

Download references

Author information

Authors and Affiliations

Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), 72840, Puebla, Mexico
J. A. de Jesús Osuna-Coutiño & Jose Martinez-Carranza
Computer Science Department, University of Bristol, Bristol, BS8 1TH, UK
Jose Martinez-Carranza

Authors

J. A. de Jesús Osuna-Coutiño
View author publications
You can also search for this author in PubMed Google Scholar
Jose Martinez-Carranza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Martinez-Carranza.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Jesús Osuna-Coutiño, J.A., Martinez-Carranza, J. Volumetric structure extraction in a single image. Vis Comput 38, 2899–2921 (2022). https://doi.org/10.1007/s00371-021-02163-w

Download citation

Accepted: 10 May 2021
Published: 21 May 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00371-021-02163-w

Keywords

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Volumetric structure extraction in a single image

Abstract

Access this article

Similar content being viewed by others

Segmentation in Point Clouds from RGB-D Using Spectral Graph Reduction

Multi-view 3D Objects Localization from Street-Level Scenes

High level structure recognition in single urban images using a CNN and SuperPixels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Volumetric structure extraction in a single image

Abstract

Access this article

Similar content being viewed by others

Segmentation in Point Clouds from RGB-D Using Spectral Graph Reduction

Multi-view 3D Objects Localization from Street-Level Scenes

High level structure recognition in single urban images using a CNN and SuperPixels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation