A robust three-stage approach to large-scale urban scene recognition

Wang, Jinglu; Lu, Yonghua; Liu, Jingbo; Quan, Long

doi:10.1007/s11432-017-9178-8

A robust three-stage approach to large-scale urban scene recognition

Moop
Published: 06 September 2017

Volume 60, article number 103101, (2017)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Jinglu Wang¹,
Yonghua Lu²,
Jingbo Liu¹ &
…
Long Quan¹

313 Accesses
10 Citations
Explore all metrics

Abstract

To obtain the ultimate high-level description of urban scenes, we propose a three-stage approach to recognizing the 3D reconstructed scene with efficient representations. First, we develop a joint semantic labeling method to obtain a semantic labeling of the triangular mesh-based representation by exploiting both image features and geometric features. The labeling is formulated over a conditional random field (CRF) that incorporates local spacial smoothness and multi-view consistency. Then, based on the labeled reconstructed meshes, we refine the man-made object segmentation in the recomposed global orthographic map with a graph partition algorithm, and propagate the coherent segmentation to the entire 3D meshes. Finally, we propose to generate a compact, abstracted geometric representation for each man-made object which is more visually appealing than the original cluttered models. This abstraction algorithm also leverages CRF formation to partition building footprints into minimal sets of structural linear features which are then used to construct profiles for large-scale scenes. The proposed recognition approach is able to robustly handle reconstructions with poor geometry and connectivity, thanks to the higher order CRF formulations which impose the ubiquitous regularity priors in urban scenes. Each stage performs an individual and uncoupling task. The intensive experiments have demonstrated the superior performance of our approach in robustness, accuracy and applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ochmann S, Vock R, Wessel R, et al. Automatic generation of structural building descriptions from 3d point cloud scans. In: Proceedings of International Conference on Computer Graphics Theory and Applications (GRAPP), Lisbon, 2014. 1–8
Google Scholar
Yu Z D, Xu C J, Liu J Z, et al. Automatic object segmentation from large scale 3d urban point clouds through manifold embedded mode seeking. In: Proceedings of the 19th ACM International Conference on Multimedia, Scottsdale, 2011. 1297–1300
Chapter Google Scholar
Matei B C, Sawhney H S, Samarasekera S, et al. Building segmentation for densely built urban regions using aerial LIDAR data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Drauschke M, Schuster H-F, Förstner W. Detectability of buildings in aerial images over scale space. In: Proceedings of Conference on Photogrammetric Computer Vision, Dresden, 2006. 7–12
Google Scholar
Mayer H. Automatic object extraction from aerial imagery—a survey focusing on buildings. Comput Vis Image Underst, 1999, 74: 138–149
Article Google Scholar
Suveg I, Vosselman G. Reconstruction of 3d building models from aerial images and maps. ISPRS J Photogramm Remote Sens, 2004, 58: 202–224
Article Google Scholar
Kraus K, Pfeifer N. Determination of terrain models in wooded areas with airborne laser scanner data. ISPRS J Photogramm Remote Sens, 1998, 53: 193–203
Article Google Scholar
Verma V, Kumar R, Hsu S. 3d building detection and modeling from aerial LIDAR data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, 2006. 2213–2220
Google Scholar
Zhang H H, Wang J L, Fang T, et al. Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. IEEE Trans Image Process, 2014, 23: 4763–4772
Article MathSciNet Google Scholar
Brédif M, Boldo D, Deseilligny M P, et al. 3d building reconstruction with parametric roof superstructures. In: Proceedings of 14th IEEE International Conference on Image Processing (ICIP), San Antonio, 2007. 537–540
Google Scholar
Rottensteiner F, Trinder J, Clode S, et al. Automated delineation of roof planes from LIDAR data. In: Proceedings of ISPRS Workshop Laser Scanning 2005, Enschede, 2005. 221–226
Google Scholar
Guo Y L, Sawhney H S, Kumar R, et al. Learning-based building outline detection from multiple aerial images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, 2001. II-545–II-552
Google Scholar
Lafarge F, Descombes X, Zerubia J, et al. Structural approach for building reconstruction from a single DSM. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 135–147
Article Google Scholar
Wang J L, Fang T, Su Q K, et al. Image-based building regularization using structural linear features. IEEE Trans Vis Comput Graph, 2016, 22: 1760–1772
Article Google Scholar
Liu J B, Wang J L, Fang T, et al. Higher-order CRF structural segmentation of 3d reconstructed surfaces. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, 2015. 2093–2101
Google Scholar
Zhou Q Y, Neumann U. 2.5 d building modeling by discovering global regularities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2012. 326–333
Google Scholar
Malik J, Belongie S, Leung T, et al. Contour and texture analysis for image segmentation. Int J Comput Vision, 2001, 43: 7–27
Article MATH Google Scholar
Kohli P, Ladicky L, Torr P. Robust higher order potentials for enforcing label consistency. Int J Comput Vision, 2009, 82: 302–324
Article Google Scholar
Liu J B, Wang J L, Fang T, et al. Higher-order CRF structural segmentation of 3d reconstructed surfaces. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, 2015. 2093–2101
Google Scholar
Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 603–619
Article Google Scholar
Ladicky L, Russell C, Kohli P, et al. Associative hierarchical CRFs for object class image segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Kyoto, 2009. 739–746
Google Scholar
Shotton J, Johnson M, Cipolla R. Semantic texton forests for image categorization and segmentation. In: Criminisi A, Shotton J, eds. Decision Forests for Computer Vision and Medical Image Analysis. London: Springer, 2008
Google Scholar
Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell, 2001, 23: 1222–1239
Article Google Scholar
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell, 2000, 22: 888–905
Article Google Scholar
Suzuki S. Topological structural analysis of digitized binary images by border following. Comput Vision Graph Image Process, 1985, 30: 32–46
Article MATH Google Scholar
Ester M, Kriegel H-P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland. 1996. 226–231
Google Scholar
Kohli P, Kumar M P, Torr P H S. P3 & beyond: solving energies with higher order cliques. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, 2007. 1–8
Google Scholar
Lhuillier M, Quan L. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans Pattern Anal Mach Intell, 2005, 27: 418–433
Article Google Scholar
Kazhdan M, Bolitho M, Hoppe H. Poisson surface reconstruction. In: Proceedings of Eurographics Symposium on Geometry Processing, Cagliari, 2006. 61–70
Google Scholar
Sinha S N, Steedly D, Szeliski R, et al. Interactive 3d architectural modeling from unordered photo collections. ACM Trans Graph, 2008, 27: 159
Article Google Scholar
Anguelov D, Taskarf B, Chatalbashev V, et al. Discriminative learning of Markov random fields for segmentation of 3d scan data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, 2005. 169–176
Google Scholar
Munoz D, Bagnell J A, Vandapel N, et al. Contextual classification with functional max-margin Markov networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 975–982
Google Scholar
Zhang H H, Xiao J X, Quan L. Supervised label transfer for semantic segmentation of street scenes. In: Proceedings of the 11th European Conference on Computer Vision, Heraklion, 2009. 561–574
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Jinglu Wang, Jingbo Liu & Long Quan
School of Resource and Environmental Sciences, Wuhan University, Wuhan, 430072, China
Yonghua Lu

Authors

Jinglu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yonghua Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jingbo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Long Quan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinglu Wang.

Electronic supplementary material