Automatic description of complex buildings from multiple images

https://doi.org/10.1016/j.cviu.2004.05.004Get rights and content

Abstract

We present an approach for detecting and describing complex buildings with flat or complex rooftops by using multiple, overlapping images of the scene. We find 3-D rooftop boundary hypotheses from the line and junction features of the images by applying consecutive grouping procedures. First, 3-D features are generated by grouping image features over multiple images, and rooftop hypotheses are generated by neighborhood searches on those features. Probabilistic reasoning, level-of-details, and cues from image-derived unedited elevation data are used at various stages to manage the huge search space for rooftop boundary hypotheses. Three-dimensional rooftop hypotheses generated by above procedures are verified with evidence collected from the images and the elevation data. Expandable Bayesian networks are used to combine evidence from multiple images. Finally, overlap and rooftop analyses are performed to find the final building models. Experimental results are shown on complex buildings.

Introduction

Three-dimensional object description is a key task of computer vision. One practical application for the 3-D object description problem is that of building detection and description from aerial images. It can greatly improve the automation of 2-D or 3-D map generation which can be used in various applications including radiowave reachability tests for wireless communications, computer graphics, virtual reality, and mission planning.

The building detection and description has been an active research area [6]. Early systems used a single intensity image, which were effective for simple buildings [9], [11], [14], [15]. In general, multiple aerial images can be obtained with a small extra cost. Thus, most of the recent work in building detection has focused on the stereo or multi-view analysis [1], [2], [3], [7], [17], [18].

There are several challenges for the building detection and description problem.

  • Figure–ground separation: we deal with outdoor images, and it is hard to separate building boundaries from other distracting lines such as road boundaries. Moreover, lines and corners of buildings are often broken and missing due to occlusion or other accidental alignments.

  • Representation: as in other 3-D object description problems, the model representation takes an important role in the building detection and description problem. When we use simpler representation, such as extrusions of rectangular rooftops [14], [17], the description result will be more robust but the detection rate will be lower1 because of its limited representational power. On the contrary, when we use a model of extremely high representational power, such as refined polygonal meshes [2], [1], we can describe many more buildings but the result will be less robust and the level of geometric information we obtain will be too poor (we just get polygonal meshes) that the usability of the result will be very limited. It is very important to find a good representation which has a high enough representational power and rich geometric information, where, at the same time, robust and computationally affordable detection and description algorithm is available.

  • Information fusion: the types of available information vary depend on the application. In most cases, stereo or multiple images are available. Range data can be generated from stereo analysis but its quality is not good enough to generate building hypotheses directly because many of the building roofs lack sufficient textures for stereo processing. In addition, nearby trees of similar height makes the use of such range data difficult. Sometimes, accurate range data, such as LIDAR, are available (at high cost). There have been efforts to maximize the use of such high-quality data [1], [4] or to increase the quality of the image-based range data with more than 10 images [20]. In this paper, we focus on the use of the image-derived unedited range data. In this case, how to combine information from images and the low-quality range data is very important.

We present an approach for detecting and describing complex buildings by using multiple, overlapping images of the scene. We use low-quality range data, such as from fully automated stereo processing, as additional information. We apply hypothesize and verify paradigm, where lower-level features are grouped (hypothesized) into higher level ones, then filtered (verified) for the purpose of minimizing the computation (otherwise, the computation will be exponential). We present a unique feature grouping approach of lines and junctions where we keep the low-level properties (and uncertainties) to the highest level. For example, a 3-D line feature is not just two end-points synthesized from 2-D line features, but also includes the actual set of 2-D line features (“member” features), and we intensively use the properties of the member features in the higher-level grouping.

To reduce the computation we apply a level-of-detail technique with probabilistic relaxation and introduce various techniques for the efficient filtering, such as information fusion (with range data), probabilistic height estimation, and the use of expandable Bayesian networks [12]. Our approach shows good description results on complex buildings. Our system detects and describes buildings of polygonal boundaries with complex roofs (including superstructures). Such a level of complexity is unprecedented in previous work (among those which do not use high-quality range data).

Section snippets

Representation and approach

We apply a model-based approach to obtain high-level geometric information and robust detection result. Usually, in model-based approaches, when the complexity of a building model (for example, the number of rooftop corners allowed) increases the search space for the grouping increases exponentially. Hence, many of the practical building description systems have used simple models such as collections of rectangular rooftops [17] or simple blocks with gable roofs [16]. While collections of

Preprocessing

We use image-derived unedited DEM (of about 1/2 of the image resolution) generated by the commercial “SocetSet” product from BAE Systems. Although DEM data do not give an explicit building model as shown in Fig. 4, it can still give a rough idea of where the buildings are located. Thus, we follow the approach of Huertas et al. [10] to generate rough cues from a DEM image for further processing. The DEM image is first convolved with a Laplacian-of-Gaussian filter to smooth the image and locate

Three-dimensional feature grouping

In ABERS, two types of 3-D features are used: 3-D linears and 3-D junctions. A 3-D feature is a group of 2-D features from different views. To obtain 3-D features, we find pair-wise matches of 2-D features using epipolar geometry, and group the matched features among different views. The height of a 3-D feature is estimated from the pair-wise height estimates.

ABERS has a unique grouping strategy where the properties of low-level features are utilized in the grouping procedure of several

Rooftop boundary hypotheses generation

ABERS operates in two modes: one is for detecting flat rooftops only (which takes less time), the other includes sloping roofs. For flat rooftops, we use DEM layers (Section 3). We apply hypotheses generation, verification (Section 6), and overlap analysis (Section 7) repeatedly for each DEM layer. For sloping roofs, we assume that the outer boundaries (eaves) are parallel to the ground (Section 2). Therefore, to generate the rooftop boundary hypothesis of a sloping roof, we apply the same

Hypotheses verification

Once rooftop hypotheses are obtained, supporting evidence is collected for them. This consists of line support, wall vertical line support, darkness of the cast shadow region, and closeness of a hypothesis to the boundary of a DEM layer.

Line support consists of the supporting (RP) and the distracting (RN) line evidence. We use line scoring function given in [17]. Given a 3-D rooftop boundary hypothesis its projection (2-D polygon) onto each image is calculated. For each side of the polygon,

Overlap analysis

It is common that more than one hypothesis is verified for a single building component, where these hypotheses represent parts of an actual building as in Fig. 26B. We aim to choose the best possible building component. However, comparing two verified hypotheses according to their verification score, P(Building Evidence) of the EBN in Fig. 25B, is not accurate because that binary classifier is not designed and learned to compare two good building hypotheses but to determine whether a certain

Superstructure analysis

For a multi-layered building complex, we need to consider the interaction among building components. Consider a building complex shown in Fig. 29A. A rooftop boundary hypothesis for the superstructure can be found with the suggested approach, but it will have weak wall and shadow evidence support when the estimation of the shadow and the wall does not consider the interaction with the base building. Therefore, it is desirable to first find the base building for the accurate verification of the

Time complexity

To estimate the time complexity of ABERS, two factors are considered; the average number of 2-D linears per image, l, and the number of images, n. The number of junctions is usually much smaller than that of the linears (Section 3.2), which is bounded by O(l). The number of linear matches in one image pair is O(l) when the possible height ranges are fixed. The actual numbers vary according to the image configuration, for example, alignment of epipolar lines and building sides and the complexity

Experimental results

We show results on several examples in this section. Unfortunately, it is difficult to acquire large data sets with multiple image coverage for a valid statistical evaluation. In addition, most of the building detection and description systems have different representational powers, and statistical evaluation on a small number of examples is less meaningful when the results strongly depend on how to choose a test dataset.

We first show the results on flat buildings. Fig. 35A is the detection

Conclusion

We have presented an approach to detection and description of buildings with complex shape rooftops and shown results on some challenging examples. The problem of modeling complex buildings retains many complexities requiring substantial future research but we believe that this work points to a promising approach. Our method uses multiple images and multiple cues such as results obtained by region matching stereo analysis and feature-based matching. We have described perceptual grouping

Acknowledgment

This research was supported by a MURI subgrant from Purdue University under Army Research Office Grant No. DAAH04-96-1-0444. Part of the low-level processing (Section 3 and Section 4.2.2) is a result of joint research with Andres Huertas.

References (20)

There are more references available in the full text version of this article.

Cited by (65)

  • Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts

    2013, ISPRS Journal of Photogrammetry and Remote Sensing
    Citation Excerpt :

    The first group handles overlapping images one by one, similar to the monocular image processing, and uses additional images for verification (e.g., Mohan and Nevatia, 1989; Collins et al., 1998; Noronha and Nevatia, 2001; Xiao et al., 2012). The second group benefits from the stereo/multiple images at the earliest stages of the processing (e.g., Fischer et al., 1998; Cord and Declercq, 2001; Cord et al., 2001; Fradkin et al., 2001; Kim and Nevatia, 2004). Both groups of approaches were also evaluated in a study conducted by Paparoditis et al. (1998).

  • Matching of straight line segments from aerial stereo images of urban areas

    2012, ISPRS Journal of Photogrammetry and Remote Sensing
    Citation Excerpt :

    In any case, the search space for matches has to be pruned in some way in order to limit the matching complexity. For most of the studies, basic geometric parameters of line segments such as orientation, length, mid-point, etc. are involved to filter the set of correspondence hypotheses; however, probably the most preferred constraint is the quadrilateral constraint generated using the epipolar geometry (e.g. Roux and McKeown, 1994; Moons et al., 1998; Heuel and Förstner, 2001; Noronha and Nevatia, 2001; Kim and Nevatia, 2004; Suveg and Vosselman, 2004). Some studies also investigated the radiometric information around the line segments (e.g. Bignone et al., 1996; Schmid and Zisserman, 1997; Henricsson, 1998; Baillard et al., 1999; Scholze et al., 2000; Zhang and Baltsavias, 2000) or the information extracted from image gradients (Bignone et al., 1996; Baillard and Dissard, 2000; Wang et al., 2009).

  • Building extraction from oblique airborne imagery based on robust façade detection

    2012, ISPRS Journal of Photogrammetry and Remote Sensing
  • 3D Scene interpretation by combining probability theory and logic: The tower of knowledge

    2011, Computer Vision and Image Understanding
    Citation Excerpt :

    In their system, Bayesian networks and utility theory were used to automate the recognition in aerial images, taking into consideration the various uncertainties in the data and in the process. Kim and Nevatia developed an Automatic Building Extraction and Reconstruction System (ABRES) which was used for detecting and describing compositions of buildings with flat or complex rooftops from multiple aerial images [27]. Probabilistic reasoning, level-details and expandable Bayesian networks were used to recognise the final models, given a set of multiple view images.

  • Aligning archive maps and extracting footprints for analysis of historic urban environments

    2011, Computers and Graphics (Pergamon)
    Citation Excerpt :

    The next section covers relevant previous work. Constructing three-dimensional models of existing cultural heritage sites has received significant attention in two areas, namely laser scanning [5–7] and photogrammetry [8]. Whilst these approaches have been used extensively to record, measure and preserve cultural heritage sites they are only capable of displaying the current state of the environment.

View all citing articles on Scopus
View full text