Automatic description of complex buildings from multiple images
Introduction
Three-dimensional object description is a key task of computer vision. One practical application for the 3-D object description problem is that of building detection and description from aerial images. It can greatly improve the automation of 2-D or 3-D map generation which can be used in various applications including radiowave reachability tests for wireless communications, computer graphics, virtual reality, and mission planning.
The building detection and description has been an active research area [6]. Early systems used a single intensity image, which were effective for simple buildings [9], [11], [14], [15]. In general, multiple aerial images can be obtained with a small extra cost. Thus, most of the recent work in building detection has focused on the stereo or multi-view analysis [1], [2], [3], [7], [17], [18].
There are several challenges for the building detection and description problem.
- •
Figure–ground separation: we deal with outdoor images, and it is hard to separate building boundaries from other distracting lines such as road boundaries. Moreover, lines and corners of buildings are often broken and missing due to occlusion or other accidental alignments.
- •
Representation: as in other 3-D object description problems, the model representation takes an important role in the building detection and description problem. When we use simpler representation, such as extrusions of rectangular rooftops [14], [17], the description result will be more robust but the detection rate will be lower1 because of its limited representational power. On the contrary, when we use a model of extremely high representational power, such as refined polygonal meshes [2], [1], we can describe many more buildings but the result will be less robust and the level of geometric information we obtain will be too poor (we just get polygonal meshes) that the usability of the result will be very limited. It is very important to find a good representation which has a high enough representational power and rich geometric information, where, at the same time, robust and computationally affordable detection and description algorithm is available.
- •
Information fusion: the types of available information vary depend on the application. In most cases, stereo or multiple images are available. Range data can be generated from stereo analysis but its quality is not good enough to generate building hypotheses directly because many of the building roofs lack sufficient textures for stereo processing. In addition, nearby trees of similar height makes the use of such range data difficult. Sometimes, accurate range data, such as LIDAR, are available (at high cost). There have been efforts to maximize the use of such high-quality data [1], [4] or to increase the quality of the image-based range data with more than 10 images [20]. In this paper, we focus on the use of the image-derived unedited range data. In this case, how to combine information from images and the low-quality range data is very important.
We present an approach for detecting and describing complex buildings by using multiple, overlapping images of the scene. We use low-quality range data, such as from fully automated stereo processing, as additional information. We apply hypothesize and verify paradigm, where lower-level features are grouped (hypothesized) into higher level ones, then filtered (verified) for the purpose of minimizing the computation (otherwise, the computation will be exponential). We present a unique feature grouping approach of lines and junctions where we keep the low-level properties (and uncertainties) to the highest level. For example, a 3-D line feature is not just two end-points synthesized from 2-D line features, but also includes the actual set of 2-D line features (“member” features), and we intensively use the properties of the member features in the higher-level grouping.
To reduce the computation we apply a level-of-detail technique with probabilistic relaxation and introduce various techniques for the efficient filtering, such as information fusion (with range data), probabilistic height estimation, and the use of expandable Bayesian networks [12]. Our approach shows good description results on complex buildings. Our system detects and describes buildings of polygonal boundaries with complex roofs (including superstructures). Such a level of complexity is unprecedented in previous work (among those which do not use high-quality range data).
Section snippets
Representation and approach
We apply a model-based approach to obtain high-level geometric information and robust detection result. Usually, in model-based approaches, when the complexity of a building model (for example, the number of rooftop corners allowed) increases the search space for the grouping increases exponentially. Hence, many of the practical building description systems have used simple models such as collections of rectangular rooftops [17] or simple blocks with gable roofs [16]. While collections of
Preprocessing
We use image-derived unedited DEM (of about 1/2 of the image resolution) generated by the commercial “SocetSet” product from BAE Systems. Although DEM data do not give an explicit building model as shown in Fig. 4, it can still give a rough idea of where the buildings are located. Thus, we follow the approach of Huertas et al. [10] to generate rough cues from a DEM image for further processing. The DEM image is first convolved with a Laplacian-of-Gaussian filter to smooth the image and locate
Three-dimensional feature grouping
In ABERS, two types of 3-D features are used: 3-D linears and 3-D junctions. A 3-D feature is a group of 2-D features from different views. To obtain 3-D features, we find pair-wise matches of 2-D features using epipolar geometry, and group the matched features among different views. The height of a 3-D feature is estimated from the pair-wise height estimates.
ABERS has a unique grouping strategy where the properties of low-level features are utilized in the grouping procedure of several
Rooftop boundary hypotheses generation
ABERS operates in two modes: one is for detecting flat rooftops only (which takes less time), the other includes sloping roofs. For flat rooftops, we use DEM layers (Section 3). We apply hypotheses generation, verification (Section 6), and overlap analysis (Section 7) repeatedly for each DEM layer. For sloping roofs, we assume that the outer boundaries (eaves) are parallel to the ground (Section 2). Therefore, to generate the rooftop boundary hypothesis of a sloping roof, we apply the same
Hypotheses verification
Once rooftop hypotheses are obtained, supporting evidence is collected for them. This consists of line support, wall vertical line support, darkness of the cast shadow region, and closeness of a hypothesis to the boundary of a DEM layer.
Line support consists of the supporting (RP) and the distracting (RN) line evidence. We use line scoring function given in [17]. Given a 3-D rooftop boundary hypothesis its projection (2-D polygon) onto each image is calculated. For each side of the polygon,
Overlap analysis
It is common that more than one hypothesis is verified for a single building component, where these hypotheses represent parts of an actual building as in Fig. 26B. We aim to choose the best possible building component. However, comparing two verified hypotheses according to their verification score, P(Building ∣Evidence) of the EBN in Fig. 25B, is not accurate because that binary classifier is not designed and learned to compare two good building hypotheses but to determine whether a certain
Superstructure analysis
For a multi-layered building complex, we need to consider the interaction among building components. Consider a building complex shown in Fig. 29A. A rooftop boundary hypothesis for the superstructure can be found with the suggested approach, but it will have weak wall and shadow evidence support when the estimation of the shadow and the wall does not consider the interaction with the base building. Therefore, it is desirable to first find the base building for the accurate verification of the
Time complexity
To estimate the time complexity of ABERS, two factors are considered; the average number of 2-D linears per image, l, and the number of images, n. The number of junctions is usually much smaller than that of the linears (Section 3.2), which is bounded by O(l). The number of linear matches in one image pair is O(l) when the possible height ranges are fixed. The actual numbers vary according to the image configuration, for example, alignment of epipolar lines and building sides and the complexity
Experimental results
We show results on several examples in this section. Unfortunately, it is difficult to acquire large data sets with multiple image coverage for a valid statistical evaluation. In addition, most of the building detection and description systems have different representational powers, and statistical evaluation on a small number of examples is less meaningful when the results strongly depend on how to choose a test dataset.
We first show the results on flat buildings. Fig. 35A is the detection
Conclusion
We have presented an approach to detection and description of buildings with complex shape rooftops and shown results on some challenging examples. The problem of modeling complex buildings retains many complexities requiring substantial future research but we believe that this work points to a promising approach. Our method uses multiple images and multiple cues such as results obtained by region matching stereo analysis and feature-based matching. We have described perceptual grouping
Acknowledgment
This research was supported by a MURI subgrant from Purdue University under Army Research Office Grant No. DAAH04-96-1-0444. Part of the low-level processing (Section 3 and Section 4.2.2) is a result of joint research with Andres Huertas.
References (20)
- et al.
The ascender system: automated site modeling from multiple aerial images
Comput. Vision Image Understand.
(1998) - et al.
Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D
Comput. Vision Image Understand.
(1998) - et al.
Incremental reconstruction of 3-D scenes from multiple, complex images
Artif. Intell.
(1986) - et al.
Detecting buildings in aerial images
Comput. Vision Graph. Image Process.
(1988) - et al.
Building detection and description from a single intensity image
Comput. Vision Image Understand.
(1998) - B. Ameri, Feature based model verification (FBMV): a new concept for validation in building reconstruction, in: Proc....
- C. Baillard, A. Zisserman, Automatic reconstruction of piecewise planar models from multiple views, in: Proc. IEEE...
- M. Cord, M. Jordan, J.-P. Cocquerez, N. Paparoditis, Automatic extraction and modelling of urban buildings from high...
- A. Gruen, R. Nevatia (Eds.), Computer Vision and Image Understanding: Special Issue on Automatic Building Extraction...
- S. Heuel, W. Förstner, Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective...
Cited by (65)
Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts
2013, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :The first group handles overlapping images one by one, similar to the monocular image processing, and uses additional images for verification (e.g., Mohan and Nevatia, 1989; Collins et al., 1998; Noronha and Nevatia, 2001; Xiao et al., 2012). The second group benefits from the stereo/multiple images at the earliest stages of the processing (e.g., Fischer et al., 1998; Cord and Declercq, 2001; Cord et al., 2001; Fradkin et al., 2001; Kim and Nevatia, 2004). Both groups of approaches were also evaluated in a study conducted by Paparoditis et al. (1998).
Matching of straight line segments from aerial stereo images of urban areas
2012, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :In any case, the search space for matches has to be pruned in some way in order to limit the matching complexity. For most of the studies, basic geometric parameters of line segments such as orientation, length, mid-point, etc. are involved to filter the set of correspondence hypotheses; however, probably the most preferred constraint is the quadrilateral constraint generated using the epipolar geometry (e.g. Roux and McKeown, 1994; Moons et al., 1998; Heuel and Förstner, 2001; Noronha and Nevatia, 2001; Kim and Nevatia, 2004; Suveg and Vosselman, 2004). Some studies also investigated the radiometric information around the line segments (e.g. Bignone et al., 1996; Schmid and Zisserman, 1997; Henricsson, 1998; Baillard et al., 1999; Scholze et al., 2000; Zhang and Baltsavias, 2000) or the information extracted from image gradients (Bignone et al., 1996; Baillard and Dissard, 2000; Wang et al., 2009).
Building extraction from oblique airborne imagery based on robust façade detection
2012, ISPRS Journal of Photogrammetry and Remote Sensing3D building reconstruction based on given ground plan information and surface models extracted from spaceborne imagery
2012, ISPRS Journal of Photogrammetry and Remote Sensing3D Scene interpretation by combining probability theory and logic: The tower of knowledge
2011, Computer Vision and Image UnderstandingCitation Excerpt :In their system, Bayesian networks and utility theory were used to automate the recognition in aerial images, taking into consideration the various uncertainties in the data and in the process. Kim and Nevatia developed an Automatic Building Extraction and Reconstruction System (ABRES) which was used for detecting and describing compositions of buildings with flat or complex rooftops from multiple aerial images [27]. Probabilistic reasoning, level-details and expandable Bayesian networks were used to recognise the final models, given a set of multiple view images.
Aligning archive maps and extracting footprints for analysis of historic urban environments
2011, Computers and Graphics (Pergamon)Citation Excerpt :The next section covers relevant previous work. Constructing three-dimensional models of existing cultural heritage sites has received significant attention in two areas, namely laser scanning [5–7] and photogrammetry [8]. Whilst these approaches have been used extensively to record, measure and preserve cultural heritage sites they are only capable of displaying the current state of the environment.