Automatic description of complex buildings from multiple images

doi:10.1016/j.cviu.2004.05.004

Computer Vision and Image Understanding

Volume 96, Issue 1, October 2004, Pages 60-95

https://doi.org/10.1016/j.cviu.2004.05.004 Get rights and content

Abstract

We present an approach for detecting and describing complex buildings with flat or complex rooftops by using multiple, overlapping images of the scene. We find 3-D rooftop boundary hypotheses from the line and junction features of the images by applying consecutive grouping procedures. First, 3-D features are generated by grouping image features over multiple images, and rooftop hypotheses are generated by neighborhood searches on those features. Probabilistic reasoning, level-of-details, and cues from image-derived unedited elevation data are used at various stages to manage the huge search space for rooftop boundary hypotheses. Three-dimensional rooftop hypotheses generated by above procedures are verified with evidence collected from the images and the elevation data. Expandable Bayesian networks are used to combine evidence from multiple images. Finally, overlap and rooftop analyses are performed to find the final building models. Experimental results are shown on complex buildings.

Introduction

Three-dimensional object description is a key task of computer vision. One practical application for the 3-D object description problem is that of building detection and description from aerial images. It can greatly improve the automation of 2-D or 3-D map generation which can be used in various applications including radiowave reachability tests for wireless communications, computer graphics, virtual reality, and mission planning.

The building detection and description has been an active research area [6]. Early systems used a single intensity image, which were effective for simple buildings [9], [11], [14], [15]. In general, multiple aerial images can be obtained with a small extra cost. Thus, most of the recent work in building detection has focused on the stereo or multi-view analysis [1], [2], [3], [7], [17], [18].

There are several challenges for the building detection and description problem.

•
Figure–ground separation: we deal with outdoor images, and it is hard to separate building boundaries from other distracting lines such as road boundaries. Moreover, lines and corners of buildings are often broken and missing due to occlusion or other accidental alignments.
•
Representation: as in other 3-D object description problems, the model representation takes an important role in the building detection and description problem. When we use simpler representation, such as extrusions of rectangular rooftops [14], [17], the description result will be more robust but the detection rate will be lower¹ because of its limited representational power. On the contrary, when we use a model of extremely high representational power, such as refined polygonal meshes [2], [1], we can describe many more buildings but the result will be less robust and the level of geometric information we obtain will be too poor (we just get polygonal meshes) that the usability of the result will be very limited. It is very important to find a good representation which has a high enough representational power and rich geometric information, where, at the same time, robust and computationally affordable detection and description algorithm is available.
•
Information fusion: the types of available information vary depend on the application. In most cases, stereo or multiple images are available. Range data can be generated from stereo analysis but its quality is not good enough to generate building hypotheses directly because many of the building roofs lack sufficient textures for stereo processing. In addition, nearby trees of similar height makes the use of such range data difficult. Sometimes, accurate range data, such as LIDAR, are available (at high cost). There have been efforts to maximize the use of such high-quality data [1], [4] or to increase the quality of the image-based range data with more than 10 images [20]. In this paper, we focus on the use of the image-derived unedited range data. In this case, how to combine information from images and the low-quality range data is very important.

We present an approach for detecting and describing complex buildings by using multiple, overlapping images of the scene. We use low-quality range data, such as from fully automated stereo processing, as additional information. We apply hypothesize and verify paradigm, where lower-level features are grouped (hypothesized) into higher level ones, then filtered (verified) for the purpose of minimizing the computation (otherwise, the computation will be exponential). We present a unique feature grouping approach of lines and junctions where we keep the low-level properties (and uncertainties) to the highest level. For example, a 3-D line feature is not just two end-points synthesized from 2-D line features, but also includes the actual set of 2-D line features (“member” features), and we intensively use the properties of the member features in the higher-level grouping.

To reduce the computation we apply a level-of-detail technique with probabilistic relaxation and introduce various techniques for the efficient filtering, such as information fusion (with range data), probabilistic height estimation, and the use of expandable Bayesian networks [12]. Our approach shows good description results on complex buildings. Our system detects and describes buildings of polygonal boundaries with complex roofs (including superstructures). Such a level of complexity is unprecedented in previous work (among those which do not use high-quality range data).

Section snippets

Representation and approach

We apply a model-based approach to obtain high-level geometric information and robust detection result. Usually, in model-based approaches, when the complexity of a building model (for example, the number of rooftop corners allowed) increases the search space for the grouping increases exponentially. Hence, many of the practical building description systems have used simple models such as collections of rectangular rooftops [17] or simple blocks with gable roofs [16]. While collections of

Preprocessing

We use image-derived unedited DEM (of about 1/2 of the image resolution) generated by the commercial “SocetSet” product from BAE Systems. Although DEM data do not give an explicit building model as shown in Fig. 4, it can still give a rough idea of where the buildings are located. Thus, we follow the approach of Huertas et al. [10] to generate rough cues from a DEM image for further processing. The DEM image is first convolved with a Laplacian-of-Gaussian filter to smooth the image and locate

Three-dimensional feature grouping

In ABERS, two types of 3-D features are used: 3-D linears and 3-D junctions. A 3-D feature is a group of 2-D features from different views. To obtain 3-D features, we find pair-wise matches of 2-D features using epipolar geometry, and group the matched features among different views. The height of a 3-D feature is estimated from the pair-wise height estimates.

ABERS has a unique grouping strategy where the properties of low-level features are utilized in the grouping procedure of several

Rooftop boundary hypotheses generation

ABERS operates in two modes: one is for detecting flat rooftops only (which takes less time), the other includes sloping roofs. For flat rooftops, we use DEM layers (Section 3). We apply hypotheses generation, verification (Section 6), and overlap analysis (Section 7) repeatedly for each DEM layer. For sloping roofs, we assume that the outer boundaries (eaves) are parallel to the ground (Section 2). Therefore, to generate the rooftop boundary hypothesis of a sloping roof, we apply the same

Hypotheses verification

Once rooftop hypotheses are obtained, supporting evidence is collected for them. This consists of line support, wall vertical line support, darkness of the cast shadow region, and closeness of a hypothesis to the boundary of a DEM layer.

Line support consists of the supporting (RP) and the distracting (RN) line evidence. We use line scoring function given in [17]. Given a 3-D rooftop boundary hypothesis its projection (2-D polygon) onto each image is calculated. For each side of the polygon,

Overlap analysis

It is common that more than one hypothesis is verified for a single building component, where these hypotheses represent parts of an actual building as in Fig. 26B. We aim to choose the best possible building component. However, comparing two verified hypotheses according to their verification score, P(Building ∣Evidence) of the EBN in Fig. 25B, is not accurate because that binary classifier is not designed and learned to compare two good building hypotheses but to determine whether a certain

Superstructure analysis

For a multi-layered building complex, we need to consider the interaction among building components. Consider a building complex shown in Fig. 29A. A rooftop boundary hypothesis for the superstructure can be found with the suggested approach, but it will have weak wall and shadow evidence support when the estimation of the shadow and the wall does not consider the interaction with the base building. Therefore, it is desirable to first find the base building for the accurate verification of the

Time complexity

To estimate the time complexity of ABERS, two factors are considered; the average number of 2-D linears per image, l, and the number of images, n. The number of junctions is usually much smaller than that of the linears (Section 3.2), which is bounded by O(l). The number of linear matches in one image pair is O(l) when the possible height ranges are fixed. The actual numbers vary according to the image configuration, for example, alignment of epipolar lines and building sides and the complexity

Experimental results

We show results on several examples in this section. Unfortunately, it is difficult to acquire large data sets with multiple image coverage for a valid statistical evaluation. In addition, most of the building detection and description systems have different representational powers, and statistical evaluation on a small number of examples is less meaningful when the results strongly depend on how to choose a test dataset.

We first show the results on flat buildings. Fig. 35A is the detection

Conclusion

We have presented an approach to detection and description of buildings with complex shape rooftops and shown results on some challenging examples. The problem of modeling complex buildings retains many complexities requiring substantial future research but we believe that this work points to a promising approach. Our method uses multiple images and multiple cues such as results obtained by region matching stereo analysis and feature-based matching. We have described perceptual grouping

Acknowledgment

This research was supported by a MURI subgrant from Purdue University under Army Research Office Grant No. DAAH04-96-1-0444. Part of the low-level processing (Section 3 and Section 4.2.2) is a result of joint research with Andres Huertas.

References (20)

R. Collins et al.
The ascender system: automated site modeling from multiple aerial images
Comput. Vision Image Understand.
(1998)
A. Fischer et al.
Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D
Comput. Vision Image Understand.
(1998)
M. Herman et al.
Incremental reconstruction of 3-D scenes from multiple, complex images
Artif. Intell.
(1986)
A. Huertas et al.
Detecting buildings in aerial images
Comput. Vision Graph. Image Process.
(1988)
C. Lin et al.
Building detection and description from a single intensity image
Comput. Vision Image Understand.
(1998)
B. Ameri, Feature based model verification (FBMV): a new concept for validation in building reconstruction, in: Proc....
C. Baillard, A. Zisserman, Automatic reconstruction of piecewise planar models from multiple views, in: Proc. IEEE...
M. Cord, M. Jordan, J.-P. Cocquerez, N. Paparoditis, Automatic extraction and modelling of urban buildings from high...
A. Gruen, R. Nevatia (Eds.), Computer Vision and Image Understanding: Special Issue on Automatic Building Extraction...
S. Heuel, W. Förstner, Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective...

There are more references available in the full text version of this article.

Cited by (65)

Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts
2013, ISPRS Journal of Photogrammetry and Remote Sensing
Citation Excerpt :
The first group handles overlapping images one by one, similar to the monocular image processing, and uses additional images for verification (e.g., Mohan and Nevatia, 1989; Collins et al., 1998; Noronha and Nevatia, 2001; Xiao et al., 2012). The second group benefits from the stereo/multiple images at the earliest stages of the processing (e.g., Fischer et al., 1998; Cord and Declercq, 2001; Cord et al., 2001; Fradkin et al., 2001; Kim and Nevatia, 2004). Both groups of approaches were also evaluated in a study conducted by Paparoditis et al. (1998).
In this study, we propose a novel methodology for automated detection of buildings from single very-high-resolution (VHR) multispectral images. The methodology uses the principal evidence of buildings: the shadows that they cast. We model the directional spatial relationship between buildings and their shadows using a recently proposed probabilistic landscape approach. An effective shadow post-processing step is developed to focus on landscapes that belong to building regions. The building regions are detected using an original two-level graph theory approach. In the first level, each shadow region is addressed separately, and building regions are identified via iterative graph cuts designed in two-label partitioning. The final building regions are characterised in a second level in which the previously labelled building regions are subjected to a single-step multi-label graph optimisation performed over the entire image domain. Numerical assessments performed on 16 VHR GeoEye-1 images demonstrate that the proposed approach is highly robust and reliable. A distinctive specialty of the proposed approach is its applicability to buildings with diverse characteristics as well as to VHR images with significantly different illumination properties.
Matching of straight line segments from aerial stereo images of urban areas
2012, ISPRS Journal of Photogrammetry and Remote Sensing
Citation Excerpt :
In any case, the search space for matches has to be pruned in some way in order to limit the matching complexity. For most of the studies, basic geometric parameters of line segments such as orientation, length, mid-point, etc. are involved to filter the set of correspondence hypotheses; however, probably the most preferred constraint is the quadrilateral constraint generated using the epipolar geometry (e.g. Roux and McKeown, 1994; Moons et al., 1998; Heuel and Förstner, 2001; Noronha and Nevatia, 2001; Kim and Nevatia, 2004; Suveg and Vosselman, 2004). Some studies also investigated the radiometric information around the line segments (e.g. Bignone et al., 1996; Schmid and Zisserman, 1997; Henricsson, 1998; Baillard et al., 1999; Scholze et al., 2000; Zhang and Baltsavias, 2000) or the information extracted from image gradients (Bignone et al., 1996; Baillard and Dissard, 2000; Wang et al., 2009).
Reliable extraction of corresponding straight lines in overlapping images can be used for different purposes such as 3D object extraction, image registration, automated triangulation, etc. In this study, a new approach for the matching of straight line features from stereo aerial images is presented. Initial correspondences between stereo images are generated using a pair-wise stereo matching approach, which involves a total of seven relational constraints. The final straight line correspondences between the stereo images are established in a line-to-line matching stage. The optimal settings for the parameters guiding the matching phase are determined after analysing the probability density functions (PDFs). The proposed approach is tested on 30 image patches of two different urban areas, and as a result, very successful and promising stereo line matching performances are achieved. Besides, the comparison of the results of the proposed approach with the results of one of the state-of-the-art stereo matching approaches proves the superiority and potential of the proposed approach.
Building extraction from oblique airborne imagery based on robust façade detection
2012, ISPRS Journal of Photogrammetry and Remote Sensing
A large number of applications and research fields rely on up-to-date and accurate representation of existing buildings, for example in GIS or 3D city models. Besides verification of existing building datasets, the detection of new objects from remote sensing data is a major task in digital photogrammetry. This paper presents a new approach on building detection and simple reconstruction using airborne oblique images only. Façades are detected in oblique images using edge and height information. The latter is extracted from the same images using a dense image matching technique, implying the need for stereo overlap at the particular façade. The façades are represented as vertical planes in object space and are used to define building hypotheses. These initial buildings are then verified and refined employing the point cloud as derived from multiple image dense matching. The method has been tested on almost 400 buildings in two areas which include different building structures. The results show that the detection rate depends on the number of viewing directions available at a particular building. A building is considered to be detected as soon as any portion of it is detected by our algorithm. Accordingly the correctness is constant above 90%, demonstrating the robustness of the approach. The completeness varies from 67% to 95%, while the geometric accuracy is limited because only box models are fitted to façades. Thus, the next step in the research will be to adapt the outline delineation to irregular buildings.
3D building reconstruction based on given ground plan information and surface models extracted from spaceborne imagery
2012, ISPRS Journal of Photogrammetry and Remote Sensing
3D surface models have gained field as an important tool for urban planning and mapping. However, urban environments have a complex nature to model and they provide a challenge to investigate the current limits of automatic digital surface modeling from high resolution satellite imagery. An approach is introduced to improve a 3D surface model, extracted photogrammetrically from satellite imagery, based on the geometric building information embodied in existing 2D ground plans. First buildings are clipped from the extracted DSM based on the 2D polygonal building ground plans. To generate prismatic shaped structures with vertical walls and flat roofs, building shape is retrieved from the cadastre database while elevation information is extracted from the DSM. Within each 2D building boundary, a constant roof height is extracted based on statistical calculations of the height values. After buildings are extracted from the initial surface model, the remaining DSM is further processed to simplify to a smooth DTM that reflects bare ground, without artifacts, local relief, vegetation, cars and city furniture. In a next phase, both models are merged to yield an integrated city model or generalized DSM. The accuracy of the generalized surface model is assessed according to a quantitative-statistical analysis by comparison with two different types of reference data.
3D Scene interpretation by combining probability theory and logic: The tower of knowledge
2011, Computer Vision and Image Understanding
Citation Excerpt :
In their system, Bayesian networks and utility theory were used to automate the recognition in aerial images, taking into consideration the various uncertainties in the data and in the process. Kim and Nevatia developed an Automatic Building Extraction and Reconstruction System (ABRES) which was used for detecting and describing compositions of buildings with flat or complex rooftops from multiple aerial images [27]. Probabilistic reasoning, level-details and expandable Bayesian networks were used to recognise the final models, given a set of multiple view images.
We explore a newly proposed system architecture, called tower of knowledge (ToK), in the context of labelling components of building scenes. The ToK architecture allows the incorporation of statistical feature distributions and logic rules concerning the definition of a component, within a probabilistic framework. The maximum likelihood method of label assignment is modified by being multiplied with a function, called utility function, that expresses the information coming from the logic rules programmed to the system. The logic rules are designed to define an object/component by answering the questions “why” and “how”, referring to the actions in which a particular object may be observed to participate and the characteristics it should have in order to be able to participate in these actions. Two sets of measurements are assumed to be available: those made initially for all components routinely, and which supply the initial statistically based inference of possible labels of each component, and those that are made in order to confirm or deny a particular characteristic of the component that would allow it to participate in a specific action. A recursive version of the architecture is also proposed, in which the distributions of the former types of measurement may be learnt in the process, having no training data at all. Multi-view images are used as input to the system, which uses standard techniques to build the 3D models of the buildings. The system is tested on labelling the components of 10 3D models of buildings. The components are identified either manually, or fully automatically. The results are compared with those obtained by expandable Bayesian networks. The recursive version of ToK proves to be able to cope very well even without any training data, where it learns the characteristics of the various components by simply applying the pre-programmed logic rules that connect labels, actions and attributes.
Aligning archive maps and extracting footprints for analysis of historic urban environments
2011, Computers and Graphics (Pergamon)
Citation Excerpt :
The next section covers relevant previous work. Constructing three-dimensional models of existing cultural heritage sites has received significant attention in two areas, namely laser scanning [5–7] and photogrammetry [8]. Whilst these approaches have been used extensively to record, measure and preserve cultural heritage sites they are only capable of displaying the current state of the environment.
Archive cartography and archaeologist's sketches are invaluable resources when analysing a historic town or city. A virtual reconstruction of a city provides the user with the ability to navigate and explore an environment which no longer exists to obtain better insight into its design and purpose. However, the process of reconstructing the city from maps depicting features such as building footprints and roads can be labour intensive. In this paper we present techniques to aid in the semi-automatic extraction of building footprints from digital images of archive maps and sketches. Archive maps often exhibit problems in the form of inaccuracies and inconsistencies in scale which can lead to incorrect reconstructions. By aligning archive maps to accurate modern vector data one may reduce these problems. Furthermore, the efficiency of the footprint extraction methods may be improved by aligning either modern vector data or previously extracted footprints, since common elements can be identified between maps of differing time periods and only the difference between the two needs to be extracted. An evaluation of two alignment approaches is presented: using a linear affine transformation and a set of piecewise linear affine transformations.

View all citing articles on Scopus

View full text

Automatic description of complex buildings from multiple images

Abstract

Introduction

Section snippets

Representation and approach

Preprocessing

Three-dimensional feature grouping

Rooftop boundary hypotheses generation

Hypotheses verification

Overlap analysis

Superstructure analysis

Time complexity

Experimental results

Conclusion

Acknowledgment

Comput. Vision Image Understand.

Comput. Vision Image Understand.

Artif. Intell.

Comput. Vision Graph. Image Process.

Comput. Vision Image Understand.