Skip to main content
Log in

High level structure recognition in single urban images using a CNN and SuperPixels

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

High-Level Structure (HLS) recognition locates elements on human-made surfaces (objects, buildings, ground, etc.). There are several approaches to HLS recognition, however, most of these approaches are based on processing 3D data in the form of point clouds extracted from the camera images. In general, 3D point cloud approaches have good performance for certain scenes with video sequences or image sequences, but they need sufficient parallax in order to guarantee accuracy. To address this problem, an alternative is to process a single RGB image seeking to interpret areas of the images where the human-made structure may be observed, thus removing parallax dependency, but adding the challenge of having to interpret image ambiguities correctly. Motivated by the latter, this work presents the results of a novel methodology for HLS recognition using a CNN-Superpixel approach from a single image. For that, our approach has three steps. First, the superpixel and centroid analysis obtains the RGB section and the superpixel to analyze. This section is a portion of the input image that our CNN uses to provide a label. Second, the structure recognition step provides a segmentation, location, and delimitation of the urbanized structures in the scene. For that, we propose a CNN-superpixel configuration, this configuration combines the abstraction power of deep learning and fast computational processing using superpixel segmentation. Third, the connectivity analysis replaces the superpixel label considering the connection between the neighbor superpixels. On the other hand, experimental results are encouraging, our approach has high performance under real-world scenarios. Also, the proposed methodology is 6.53 to 12.18 faster than previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. A ‡ symbol expresses a significant difference between our approach (CNN-SP+RGB) and the semantic segmentation approaches (GFL,ID3-Depth-1,ID3-Depth-2,CNN-SP+D, CNN-SP+DGT, and HLS-GNet)

  2. A ‡ symbol expresses a significant difference between our approach (CNN-SP+RGB) and the semantic segmentation approaches (GFL,ID3-Depth-1,ID3-Depth-2,CNN-SP+D, CNN-SP+DGT, and HLS-GNet)

References

  1. A S, M S, Y NA (2009) Make3D: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 824–840. https://doi.org/10.1109/TPAMI.2008.132

  2. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2010) SLIC superpixels. EPFL

  3. Aguilar-González A., Arias-Estrada, M., Berry F (2019) Depth from a motion algorithm and a hardware architecture for smart cameras. Sensors MDPI. https://doi.org/10.3390/s19010053

  4. Alhashim I, Wonka P (2019) High quality monocular depth estimation via transfer learning. arXiv:1812.11941

  5. B M, H W, J K (2008) Detection and matching of rectilinear structures. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–7

  6. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

  7. Chen T, Liu X, Feng R, Wang W, Yuan C, Lu W, He H, Gao H, Ying H, Chen DZ, Wu J (2021) Discriminative cervical lesion detection in Colposcopic images with global class activation and local bin excitation. IEEE Journal of Biomedical and Health Informatics (JBHI). https://doi.org/10.1109/JBHI.2021.3100367

  8. Chen J, Ying H, Liu X, Gu J, Feng R, Chen T, Gao H, Wu J (2021) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM transactions on computational biology and Bioinformatics (TCBB) 18(1):103–113. https://doi.org/10.1109/TCBB.2020.2991173

    Google Scholar 

  9. D H, A EA, M H (2007) Recovering surface layout from an image. International Journal of Computer Vision, pp 151–172. https://doi.org/10.1007/s11263-006-0031-y

  10. D H, M EAAH (2005) Geometric context from a single image. IEEE International Conference on Computer Vision (ICCV), pp 654–661

  11. E M, Y C, J M (2011) Single image augmented reality using planar structures in urban environments. In: Machine vision and image processing conference (IMVIP), pp 1–6

  12. Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge a retrospective. Int J Comput Vis 111:98–136. https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

  13. Feng R, Liu X, Chen J, Chen DZ, Gao H, Wu J (2021) A deep learning approach for colonoscopy pathology WSI analysis: accurate segmentation and classification. IEEE J Biomed Health Inform 25(10):3700–3708. https://doi.org/10.1109/JBHI.2020.3040269

    Article  Google Scholar 

  14. Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2021) The deep features and attention mechanism based method to dish Healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Transactions on Computational Social Systems (TCSS). https://doi.org/10.1109/TCSS.2021.3102591

  15. Hashim HA (2021) A geometric nonlinear stochastic filter for simultaneous localization and mapping. Aerospace Science and Technology. https://doi.org/10.1016/j.ast.2021.106569

  16. Hu MK (1962) Visual pattern recognition by moment invariants. IRE Transaction Information Theory, pp 179–187

  17. Huang J, Zhou Y, Funkhouser T, Guibas L (2019) FrameNet: learning local canonical frames of 3D surfaces from a single rgb image. IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/ICCV.2019.00873

  18. Joo K, Oh TH, Kim J, Kweon IS (2019) Robust and globally optimal manhattan frame estimation in near real time. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 682–696. https://doi.org/10.1109/TPAMI.2018.2799944

  19. Kang Z, Yang J, Yang Z, Cheng S (2020) A review of techniques for 3d reconstruction of indoor environments. International Journal of Geo-Information (ISPRS), pp 1–31. https://doi.org/10.3390/ijgi9050330

  20. Kim P, Coltin B, Kim HJ (2018) Linear RGB-D SLAM for planar environments. European Conference on Computer Vision (ECCV), pp 333–348

  21. Kovsecká J, Zhang W (2005) Extraction, matching, and pose recovery based on dominant rectangular structures. Comput Vis Image Underst 100 (3):274–293

    Article  Google Scholar 

  22. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. European Conference on Computer Vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  23. Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38:2024–2039. https://doi.org/10.1109/TPAMI.2015.2505283

    Article  Google Scholar 

  24. Liu S, Zhou Y, Zhao Y (2021) VaPiD: a rapid vanishing point detector via learned optimizers. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 12859–12868

  25. Luo S, Wei H (2021) Diffusion probabilistic models for 3D point cloud generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2837–2845

  26. Mahmoud MH, Alamery S, Fouad H, Altinawi A, Youssef AE (2021) An automatic detection system of diabetic retinopathy using a hybrid inductive machine learning algorithm. Personal and Ubiquitous Computing. https://doi.org/10.1007/s00779-020-01519-8

  27. O H, A C (2012) Estimating planar structure in single images by learning from examples. International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp 289–294

  28. O H, A C (2015 ) Recognising planes in a single image. IEEE transactionson pattern analysis and machine intelligence, pp 1849–1861. https://doi.org/10.1109/TPAMI.2014.2382097

  29. Osuna-Coutiño JAdJ, Cruz-Martínez C, Martinez-Carranza J, Arias-Estrada M, Mayol-Cuevas W (2016) I want to change my floor: dominant plane recognition from a single image to augment the scene. IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp 135–140. https://doi.org/10.1109/ISMAR-Adjunct.2016.0060

  30. Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) High level 3D structure extraction from a single image using a CNN-based approach. Sensors. https://doi.org/10.3390/s19030563

  31. Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) A binary descriptor invariant to rotation and robust to noise (BIRRN) for floor recognition. Springer Mexican Conference on Pattern Recognition (MCPR) 11524:271–281. https://doi.org/10.1007/978-3-030-21077-9_25

    Google Scholar 

  32. Osuna-Coutiño JAdJ, Martinez-Carranza J (2019) Binary-patterns based floor recognition suitable for urban scenes. IEEE International Conference on Control, Decision and Information Technologies (CoDIT). https://doi.org/10.1109/CoDIT.2019.8820296

  33. Osuna-Coutiño JAdJ, Martinez-Carranza J (2020) Structure extraction in urbanized aerial images from a single view using a CNN-based approach. Taylor & Francis in International Journal of Remote Sensing, pp. (pp 1–25). https://doi.org/10.1080/01431161.2020.1767821

  34. Osuna-Coutiño JAdJ, Martinez-Carranza J (2021) Volumetric structure extraction in a single image. The Visual Computer. Springer, https://doi.org/10.1007/s00371-021-02163-w

  35. Osuna-Coutiño JAdJ, Martinez-Carranza J, Arias-Estrada M, Mayol-Cuevas W (2016) Dominant plane recognition in interior scenes from a single image. International Conference on Pattern Recognition (ICPR), pp 1923–1928

  36. Peng X, Zhu X, Wang T, Ma Y (2022) SIDE: center-based stereo 3D detector with structure-aware instance depth estimation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 119–128

  37. Ren Z, Lee YJ (2018) Cross-domain self-supervised multi-task feature learning using synthetic imagery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  38. Ren Z, Lee YJ (2018) Cross-domain self-supervised multi-task feature learning using synthetic imagery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 762–771

  39. Rosen DM, Doherty KJ, Terán Espinoza A, Leonard JJ (2021) Advances in inference and representation for simultaneous localization and mapping. Annual Review of Control, Robotics, and Autonomous Systems, pp 215–242. arXiv:2103.05041

  40. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747

  41. Saxena A, Chung SH, Ng AY (2005) Learning depth from single monocular images. Advances in Neural Information Processing Systems NIPS

  42. Shen X, Cohen S, Wang P, Russell B, Price B, Eisenmann J (2019) Planar region guided 3D geometry estimation from a single image. Patent

  43. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations. arXiv:1409.1556

  44. Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant CNNs. International Conference on 3D Vision (3DV). KITTI Dataset. https://doi.org/10.1109/3DV.2017.00012

  45. Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2019) NormalNet: a voxel-based CNN for 3D object classification and retrieval. Neurocomputing, pp 139–147. https://doi.org/10.1016/j.neucom.2018.09.075

  46. Wang X, Fouhey D, Gupta A (2015) Designing deep networks for surface normal estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 539–547

  47. Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: a convolutional neural network for 6D Object pose estimation in cluttered scenes conference: robotics: science and systems. https://doi.org/10.15607/RSS.2018.XIV.019

  48. Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s, Article 15, 19 pages. https://doi.org/10.1145/3419842

  49. Y L, L B, Y B, P H (1998) Gradient-based learning applied to document recognition. Proc IEEE, pp 2278–2324

  50. Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-bert: pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19313-19322. https://doi.org/10.48550/arXiv.2111.14819

  51. Zhao R, Pang M, Liu C, Zhang Y (2019) Robust normal estimation for 3D LiDAR point clouds in urban environments. Sensors MDPI. https://doi.org/10.3390/s19051248

  52. Zhu Y, Zhang W, Chen Y, Gao H (2019) A novel approach to workload prediction using attention-based LSTM encoder-decoder network in cloud environment. EURASIP Journal on Wireless Communications and Networking, 2019(247). https://doi.org/10.1186/s13638-019-1605-z

  53. Zhu H, Zuo X, Yang H, Wang S, Cao X, Yang R (2021) Detailed avatar recovery from single image. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3102128

Download references

Funding

The first author is thankful the internal call for collaboration scholarships INAOE 2021-2002. The second author is thankful for the support received through his Royal Society-Newton Advanced Fellowship with reference NA140454.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Martinez-Carranza.

Ethics declarations

Conflict of Interests

We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Osuna-Coutiño, J.d.J., Martinez-Carranza, J. High level structure recognition in single urban images using a CNN and SuperPixels. Multimed Tools Appl 82, 25175–25196 (2023). https://doi.org/10.1007/s11042-023-14422-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14422-0

Keywords

Navigation