High level structure recognition in single urban images using a CNN and SuperPixels

Osuna-Coutiño, J.A. de Jesús; Martinez-Carranza, Jose

doi:10.1007/s11042-023-14422-0

High level structure recognition in single urban images using a CNN and SuperPixels

Published: 11 February 2023

Volume 82, pages 25175–25196, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

J.A. de Jesús Osuna-Coutiño^1,2 &
Jose Martinez-Carranza ORCID: orcid.org/0000-0002-8914-1904¹

150 Accesses
4 Altmetric
Explore all metrics

Abstract

High-Level Structure (HLS) recognition locates elements on human-made surfaces (objects, buildings, ground, etc.). There are several approaches to HLS recognition, however, most of these approaches are based on processing 3D data in the form of point clouds extracted from the camera images. In general, 3D point cloud approaches have good performance for certain scenes with video sequences or image sequences, but they need sufficient parallax in order to guarantee accuracy. To address this problem, an alternative is to process a single RGB image seeking to interpret areas of the images where the human-made structure may be observed, thus removing parallax dependency, but adding the challenge of having to interpret image ambiguities correctly. Motivated by the latter, this work presents the results of a novel methodology for HLS recognition using a CNN-Superpixel approach from a single image. For that, our approach has three steps. First, the superpixel and centroid analysis obtains the RGB section and the superpixel to analyze. This section is a portion of the input image that our CNN uses to provide a label. Second, the structure recognition step provides a segmentation, location, and delimitation of the urbanized structures in the scene. For that, we propose a CNN-superpixel configuration, this configuration combines the abstraction power of deep learning and fast computational processing using superpixel segmentation. Third, the connectivity analysis replaces the superpixel label considering the connection between the neighbor superpixels. On the other hand, experimental results are encouraging, our approach has high performance under real-world scenarios. Also, the proposed methodology is 6.53 to 12.18 faster than previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Volumetric structure extraction in a single image

Article 21 May 2021

Automatic superpixel generation algorithm based on a quadric error metric in 3D space

Article 28 September 2016

Analytical Comparison of Deep Learning Frameworks for Semantic Segmentation with Pixel-Level Understanding

Notes

A ‡ symbol expresses a significant difference between our approach (CNN-SP+RGB) and the semantic segmentation approaches (GFL,ID3-Depth-1,ID3-Depth-2,CNN-SP+D, CNN-SP+DGT, and HLS-GNet)
A ‡ symbol expresses a significant difference between our approach (CNN-SP+RGB) and the semantic segmentation approaches (GFL,ID3-Depth-1,ID3-Depth-2,CNN-SP+D, CNN-SP+DGT, and HLS-GNet)

References

A S, M S, Y NA (2009) Make3D: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 824–840. https://doi.org/10.1109/TPAMI.2008.132
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2010) SLIC superpixels. EPFL
Aguilar-González A., Arias-Estrada, M., Berry F (2019) Depth from a motion algorithm and a hardware architecture for smart cameras. Sensors MDPI. https://doi.org/10.3390/s19010053
Alhashim I, Wonka P (2019) High quality monocular depth estimation via transfer learning. arXiv:1812.11941
B M, H W, J K (2008) Detection and matching of rectilinear structures. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–7
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Chen T, Liu X, Feng R, Wang W, Yuan C, Lu W, He H, Gao H, Ying H, Chen DZ, Wu J (2021) Discriminative cervical lesion detection in Colposcopic images with global class activation and local bin excitation. IEEE Journal of Biomedical and Health Informatics (JBHI). https://doi.org/10.1109/JBHI.2021.3100367
Chen J, Ying H, Liu X, Gu J, Feng R, Chen T, Gao H, Wu J (2021) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM transactions on computational biology and Bioinformatics (TCBB) 18(1):103–113. https://doi.org/10.1109/TCBB.2020.2991173
Google Scholar
D H, A EA, M H (2007) Recovering surface layout from an image. International Journal of Computer Vision, pp 151–172. https://doi.org/10.1007/s11263-006-0031-y
D H, M EAAH (2005) Geometric context from a single image. IEEE International Conference on Computer Vision (ICCV), pp 654–661
E M, Y C, J M (2011) Single image augmented reality using planar structures in urban environments. In: Machine vision and image processing conference (IMVIP), pp 1–6
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge a retrospective. Int J Comput Vis 111:98–136. https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
Feng R, Liu X, Chen J, Chen DZ, Gao H, Wu J (2021) A deep learning approach for colonoscopy pathology WSI analysis: accurate segmentation and classification. IEEE J Biomed Health Inform 25(10):3700–3708. https://doi.org/10.1109/JBHI.2020.3040269
Article Google Scholar
Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2021) The deep features and attention mechanism based method to dish Healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Transactions on Computational Social Systems (TCSS). https://doi.org/10.1109/TCSS.2021.3102591
Hashim HA (2021) A geometric nonlinear stochastic filter for simultaneous localization and mapping. Aerospace Science and Technology. https://doi.org/10.1016/j.ast.2021.106569
Hu MK (1962) Visual pattern recognition by moment invariants. IRE Transaction Information Theory, pp 179–187
Huang J, Zhou Y, Funkhouser T, Guibas L (2019) FrameNet: learning local canonical frames of 3D surfaces from a single rgb image. IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/ICCV.2019.00873
Joo K, Oh TH, Kim J, Kweon IS (2019) Robust and globally optimal manhattan frame estimation in near real time. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 682–696. https://doi.org/10.1109/TPAMI.2018.2799944
Kang Z, Yang J, Yang Z, Cheng S (2020) A review of techniques for 3d reconstruction of indoor environments. International Journal of Geo-Information (ISPRS), pp 1–31. https://doi.org/10.3390/ijgi9050330
Kim P, Coltin B, Kim HJ (2018) Linear RGB-D SLAM for planar environments. European Conference on Computer Vision (ECCV), pp 333–348
Kovsecká J, Zhang W (2005) Extraction, matching, and pose recovery based on dominant rectangular structures. Comput Vis Image Underst 100 (3):274–293
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. European Conference on Computer Vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38:2024–2039. https://doi.org/10.1109/TPAMI.2015.2505283
Article Google Scholar
Liu S, Zhou Y, Zhao Y (2021) VaPiD: a rapid vanishing point detector via learned optimizers. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 12859–12868
Luo S, Wei H (2021) Diffusion probabilistic models for 3D point cloud generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2837–2845
Mahmoud MH, Alamery S, Fouad H, Altinawi A, Youssef AE (2021) An automatic detection system of diabetic retinopathy using a hybrid inductive machine learning algorithm. Personal and Ubiquitous Computing. https://doi.org/10.1007/s00779-020-01519-8
O H, A C (2012) Estimating planar structure in single images by learning from examples. International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp 289–294
O H, A C (2015 ) Recognising planes in a single image. IEEE transactionson pattern analysis and machine intelligence, pp 1849–1861. https://doi.org/10.1109/TPAMI.2014.2382097
Osuna-Coutiño JAdJ, Cruz-Martínez C, Martinez-Carranza J, Arias-Estrada M, Mayol-Cuevas W (2016) I want to change my floor: dominant plane recognition from a single image to augment the scene. IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp 135–140. https://doi.org/10.1109/ISMAR-Adjunct.2016.0060
Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) High level 3D structure extraction from a single image using a CNN-based approach. Sensors. https://doi.org/10.3390/s19030563
Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) A binary descriptor invariant to rotation and robust to noise (BIRRN) for floor recognition. Springer Mexican Conference on Pattern Recognition (MCPR) 11524:271–281. https://doi.org/10.1007/978-3-030-21077-9_25
Google Scholar
Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) Binary-patterns based floor recognition suitable for urban scenes. IEEE International Conference on Control, Decision and Information Technologies (CoDIT). https://doi.org/10.1109/CoDIT.2019.8820296
Osuna-Coutiño JAdJ, Martinez-Carranza J (2020) Structure extraction in urbanized aerial images from a single view using a CNN-based approach. Taylor & Francis in International Journal of Remote Sensing, pp. (pp 1–25). https://doi.org/10.1080/01431161.2020.1767821
Osuna-Coutiño JAdJ, Martinez-Carranza J (2021) Volumetric structure extraction in a single image. The Visual Computer. Springer, https://doi.org/10.1007/s00371-021-02163-w
Osuna-Coutiño JAdJ, Martinez-Carranza J, Arias-Estrada M, Mayol-Cuevas W (2016) Dominant plane recognition in interior scenes from a single image. International Conference on Pattern Recognition (ICPR), pp 1923–1928
Peng X, Zhu X, Wang T, Ma Y (2022) SIDE: center-based stereo 3D detector with structure-aware instance depth estimation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 119–128
Ren Z, Lee YJ (2018) Cross-domain self-supervised multi-task feature learning using synthetic imagery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ren Z, Lee YJ (2018) Cross-domain self-supervised multi-task feature learning using synthetic imagery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 762–771
Rosen DM, Doherty KJ, Terán Espinoza A, Leonard JJ (2021) Advances in inference and representation for simultaneous localization and mapping. Annual Review of Control, Robotics, and Autonomous Systems, pp 215–242. arXiv:2103.05041
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
Saxena A, Chung SH, Ng AY (2005) Learning depth from single monocular images. Advances in Neural Information Processing Systems NIPS
Shen X, Cohen S, Wang P, Russell B, Price B, Eisenmann J (2019) Planar region guided 3D geometry estimation from a single image. Patent
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations. arXiv:1409.1556
Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant CNNs. International Conference on 3D Vision (3DV). KITTI Dataset. https://doi.org/10.1109/3DV.2017.00012
Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2019) NormalNet: a voxel-based CNN for 3D object classification and retrieval. Neurocomputing, pp 139–147. https://doi.org/10.1016/j.neucom.2018.09.075
Wang X, Fouhey D, Gupta A (2015) Designing deep networks for surface normal estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 539–547
Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: a convolutional neural network for 6D Object pose estimation in cluttered scenes conference: robotics: science and systems. https://doi.org/10.15607/RSS.2018.XIV.019
Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s, Article 15, 19 pages. https://doi.org/10.1145/3419842
Y L, L B, Y B, P H (1998) Gradient-based learning applied to document recognition. Proc IEEE, pp 2278–2324
Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-bert: pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19313-19322. https://doi.org/10.48550/arXiv.2111.14819
Zhao R, Pang M, Liu C, Zhang Y (2019) Robust normal estimation for 3D LiDAR point clouds in urban environments. Sensors MDPI. https://doi.org/10.3390/s19051248
Zhu Y, Zhang W, Chen Y, Gao H (2019) A novel approach to workload prediction using attention-based LSTM encoder-decoder network in cloud environment. EURASIP Journal on Wireless Communications and Networking, 2019(247). https://doi.org/10.1186/s13638-019-1605-z
Zhu H, Zuo X, Yang H, Wang S, Cao X, Yang R (2021) Detailed avatar recovery from single image. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3102128

Download references

Funding

The first author is thankful the internal call for collaboration scholarships INAOE 2021-2002. The second author is thankful for the support received through his Royal Society-Newton Advanced Fellowship with reference NA140454.

Author information

Authors and Affiliations

Department of Computer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro 1, Cholula, 72840, Puebla, México
J.A. de Jesús Osuna-Coutiño & Jose Martinez-Carranza
Department of Science, Universidad de Ciencia y Tecnología Descartes S.C., Ciprés 480, Tuxtla, 29065, Chiapas, México
J.A. de Jesús Osuna-Coutiño

Authors

J.A. de Jesús Osuna-Coutiño
View author publications
You can also search for this author in PubMed Google Scholar
Jose Martinez-Carranza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Martinez-Carranza.

Ethics declarations

Conflict of Interests

We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Osuna-Coutiño, J.d.J., Martinez-Carranza, J. High level structure recognition in single urban images using a CNN and SuperPixels. Multimed Tools Appl 82, 25175–25196 (2023). https://doi.org/10.1007/s11042-023-14422-0

Download citation

Received: 19 January 2022
Revised: 27 November 2022
Accepted: 21 January 2023
Published: 11 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14422-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High level structure recognition in single urban images using a CNN and SuperPixels

Abstract

Access this article

Similar content being viewed by others

Volumetric structure extraction in a single image

Automatic superpixel generation algorithm based on a quadric error metric in 3D space

Analytical Comparison of Deep Learning Frameworks for Semantic Segmentation with Pixel-Level Understanding

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High level structure recognition in single urban images using a CNN and SuperPixels

Abstract

Access this article

Similar content being viewed by others

Volumetric structure extraction in a single image

Automatic superpixel generation algorithm based on a quadric error metric in 3D space

Analytical Comparison of Deep Learning Frameworks for Semantic Segmentation with Pixel-Level Understanding

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation