Enhancing accuracy and expressive power of range query answers over incomplete spatial databases via a novel reasoning approach

https://doi.org/10.1016/j.datak.2011.03.002Get rights and content

Abstract

Modern spatial database applications built on top of distributed and heterogeneous spatial information sources such as conventional spatial databases underlying Geographical Information Systems (GIS), spatial data files and spatial information acquired or inferred from the Web, suffer from data integration and topological consistency problems. This more-and-more conveys in incomplete information, which makes answering range queries over incomplete spatial databases a leading research challenge in spatial database systems research. A significant instance of this setting is represented by the application scenario in which the geometrical information on a sub-set of spatial database objects is incomplete whereas the spatial database still stores topological relations among these objects (e.g., containment relations). Focusing on the spatial database application scenario above, in this paper we propose and experimentally assess a novel technique for efficiently answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning. We also propose I-SQE (Spatial Query Engine for Incomplete Information), an innovative query engine implementing this technique. Our proposed technique results not only effective but also efficient against both synthetic and real-life spatial data sets, and it finally allows us to enhance the quality and the expressive power of retrieved answers by meaningfully taking advantages from the amenity of representing spatial database objects via both the geometrical and the topological level.

Introduction

In heterogeneous spatial database environments of modern Geographical Information Systems (GIS), data repositories collected and integrated from different spatial information sources very often coexist. Here, data integration is the main issue to be faced-off [7]. Data integration has been extensively studied in the context of spatial databases (e.g., [8], [20]). Conventional spatial databases such as those underlying autonomous GIS, raw data files storing geographical information, and GIS-related Web pages are popular instances of these sources. Furthermore, the proliferation of Web- and Grid-service-based applications and systems built on top of spatial data repositories leads to an Internet-wide dissemination of spatial information sources. This phenomenon makes dealing with integration issues of spatial database systems more difficult. While the popularity of spatial data repositories within modern complex information systems, such as Web, Grid and P2P systems, is clearly an opportunity that puts the basis for further studies in the field and, symmetrically, for the industrial proliferation of spatial database systems, spatial data repositories collected and integrated from different spatial information sources also pose several research challenges. This heterogeneity derives from the fact that different spatial information sources, being this difference determined by heterogeneity in data models, data formats, ranges of data domains, null values handling policies and so forth, very often coexist in the same GIS, thus leading to the presence of incomplete spatial information [3], [14], [27]. On the other hand, another leading challenge in spatial database systems research is represented by the issue of extending the capabilities of conventional query engines in order to make them capable of dealing with the presence of several heterogenous representations of spatial information (e.g., [33]), which very often arise in actual GIS. This paradigm pursues the idea of representing the same spatial information kept in spatial database objects according to different levels, or layers, in order to enhance the expressive power of both abstraction and reasoning capabilities over spatial data. It should be noted that the latter one is a critical aspect in spatial database systems research, as modern complex information systems are more and more heterogenous in nature and kind of underlying data repositories, so that heterogeneous representations of spatial information arise accordingly. As a consequence, spatial query engines interfacing these systems have to cope with the deriving data integration issues.

Inspired by these considerations, in this paper we present a novel technique for answering range queries over incomplete spatial databases, like those that derive from integrating distributed and heterogeneous spatial information sources, along with an innovative query engine implementing this technique, called I-SQE (Spatial Query Engine for Incomplete information). In particular, we investigate the problem of answering range queries over spatial databases where spatial information is modeled and represented according to two different levels, i.e. the geometrical level and the topological level, respectively. Also, for a sub-set of spatial database objects stored in the target spatial database interfaced by I-SQE, one of these two levels can be missing, so that, as a consequence, incomplete spatial information occurs, and the spatial database is incomplete. The main goal of our research consists in devising intelligent techniques for answering range queries over this kind of spatial databases while overcoming incompleteness limitations.

In a conventional spatial database, spatial information is usually represented by means of detailed geometrical properties of spatial database objects. This because geometrical one is the most complete representation that one can provide about spatial database objects. For instance, given a collection of spatial database objects, topological relations among these objects (e.g., containment relations) can be derived from their geometrical properties. In our research, we consider an application scenario where spatial information can be incomplete, i.e. a sub-set of spatial database objects is described by their topological relations with other spatial database objects stored in the target spatial database, whereas the geometrical information about these objects is missing. As a consequence, conventional spatial query engines, which are based on the complete availability of geometrical information about spatial database objects, are not capable of effectively and efficiently answering spatial queries involving such objects.

To give an example, consider the simple case of a spatial database representing streets of a given urban area, along their geometry (i.e., geometrical information is available). Furthermore, assume that the spatial database also stores topological relations about regional areas and streets, while the geometry of regional areas is not known (i.e., topological information is available while geometrical information is not available). The simplest case of topological relation is represented by the containment one, which models the fact that a regional area A contains a set of streets {S0, S1, …, SN  1}, such that N > 0. If only the geometrical layer is exploited to answer range queries over the spatial database, then users only retrieve geometrical information about streets, whereas topological information on regional areas is not exploited at all. It should be noted that, in a scenario like the one described above, knowledge extracted from topological relations represents a critical “add-in” value for modern GIS applications and systems, as this information can be further exploited to enhance the knowledge discovery phase from spatial databases.

Contrary to the example above, in a spatial database server implementing our query answering technique for incomplete spatial information, users are allowed to integrate knowledge kept in both levels, i.e. the geometrical and the topological level, respectively, thus taking advantages from both the different data representation models. Moreover, it should be noted that this paradigm is also “self-alimenting”, meaning that new topological relations among queried spatial database objects can be derived by means of simple yet effective composition rules over already-extracted topological relations made available in the spatial database system via the query task.

A secondary contribution of our research is represented by an innovative approach for representing and computing topological relations in spatial database systems via data compression paradigms. This finally allows us to efficiently store and query topological information, which constitutes a critical bottleneck in GIS built on top of very large spatial data repositories.

The remaining part of this paper is organized as follows. Section 2 provides an illustrative example that demonstrates the benefits deriving from our proposed approach for answering range queries over incomplete spatial databases. In Section 3, we review significant efforts that are related to our research. Section 4 describes our technique for answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning. In Section 5, we introduce an innovative solution for representing and computing topological relations via a data compression approach. In Section 6, we present in detail I-SQE, along its main principles, components and reference architecture. Section 7 presents our experimental results on both synthetic and real-life spatial data sets. Finally, Section 8 discusses conclusions and future work of our research.

Section snippets

Illustrative example

In order to better describe the benefits deriving from our proposed approach for answering range queries over incomplete spatial databases, in this Section we provide an illustrative example devised on top of a toy (incomplete) spatial database. Let D denote the example incomplete spatial database. Let GD={A,B,C,D} (see Fig. 1(a)) denote the set of spatial database objects of D for which geometrical information is available, which, for the sake of simplicity, we name as geometrical objects. Let

Related work

In recent years, the proliferation of spatial data repositories has posed several challenges related to data integration issues from distributed and heterogeneous spatial information sources. For instance, the huge quantity of spatial data available on the Web leads to the possibility of their acquisition and integration within GIS, also in a semi-automatic manner [32]. Nevertheless, methods for spatial data acquisition are manifold, and each GIS software makes use of different and heterogenous

Effectively and efficiently answering range queries over incomplete spatial databases via reasoning on incompleteness of spatial information

In this Section, we present our technique for integrating geometrical information and topological reasoning in order to answer range queries over incomplete spatial databases, which is implemented by algorithm evaluateRangeQuery.

As highlighted in Section 1, in our research we focus the attention on the challenging case of dealing with incomplete spatial databases where geometrical information associated to a sub-set of spatial database objects stored in the target spatial database is missing,

Compressing and computing topological relations in spatial database systems

As highlighted in 1 Introduction, 4 Effectively and efficiently answering range queries over incomplete spatial databases via reasoning on incompleteness of spatial information, in our reference spatial database scenario we assume that topological relations among spatial database objects are stored and made available in the target spatial database system. Furthermore, topological relations are indexed by means of a high-performance B-tree indexing data structure, which allows us to easily

I-SQE: architecture and functionalities

I-SQE is characterized by a multi-layer architecture, which is shown in Fig. 5. Each layer of the I-SQE architecture deals with a specific abstraction of the approach we propose for answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning.

The main components of I-SQE are the following.

  • Data Integration Module (DIM) This component deals with the problem of integrating spatial data coming from different and heterogeneous spatial

Experimental results

In this Section, we present our experimental results obtained from stressing the performance of algorithm evaluateRangeQuery (see Section 4) against both synthetic and real-life spatial data sets. Our hardware/software infrastructure was composed by a workstation equipped with an Intel Core 2 Duo processor at 2 GHz and 2 GB RAM, and running MacOSX as operating system. As regards the programming language, evaluateRangeQuery was implemented in C.

Conclusions and future work

Focusing on the challenging application scenario represented by GIS built on top of very large and incomplete spatial data repositories, in this paper we have presented and experimentally assessed a novel technique for answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning, along with an innovative query engine implementing this technique, I-SQE. In particular, we have investigated an application scenario in which topological

Alfredo Cuzzocrea is actually a Senior Researcher at the Institute of High Performance Computing and Networking of the Italian National Research Council, Italy, and an Adjunct Professor at the Department of Electronics, Computer Science and Systems of the University of Calabria, Italy. His research interests include multidimensional data modeling and querying, data stream modeling and querying, data warehousing and OLAP, OLAM, XML data management, Web information systems modeling and

References (39)

  • Utah Automated Geographic Center

    USA_UT Data Set

  • S. Conrad

    Foderierte Datenbanksysteme

    (1997)
  • Open Geospatial Consortium

    Geography Markup Language

  • World Wide Web Consortium

    Scalable Vector Graphics (SVG) 1.0 Specification

  • World Wide Web Consortium

    Simple Object Access Protocol (SOAP) 1.1 Specification

  • S.M.R. Dehak et al.

    Spatial Reasoning with Incomplete Information on Relative Positioning

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • M.J. Egenhofer

    Reasoning About Binary Topological Relations

  • M. Essid et al.

    Query Processing in a Geographic Mediation System

  • A. Gal et al.

    Aggregate Query Answering Under Uncertain Schema Mappings

  • Cited by (9)

    • k-dominant skyline queries on incomplete data

      2016, Information Sciences
      Citation Excerpt :

      Canahuate et al. [6] utilize two popularly employed indexing techniques, i.e., bitmaps and quantization, to efficiently answer queries in the presence of missing data. Several interesting queries on incomplete data, including ranking/top-k queries [16,20,42] and similarity queries [8,11,32], have also been investigated. Haghani et al. [16] explore the problem of processing continuous monitoring top-k queries over multiple non-synchronized incomplete data streams.

    • The NOBH-tree: Improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes

      2014, Data and Knowledge Engineering
      Citation Excerpt :

      Geometrical information was employed to help in overcoming problems on similarity queries. The work of Cuzzocrea et al. improved the efficiency of range queries on incomplete spatial databases, a problem equivalent to spatial data integration, by integrating geometrical information and topological reasoning [15]. The incremental search algorithm proposed by Hjaltason et al. [20] makes use of a priority queue where the queue elements are the blocks of the underlying data structure as well as the elements themselves.

    • A safe-exit approach for efficient network-based moving range queries

      2012, Data and Knowledge Engineering
      Citation Excerpt :

      Unlike range queries, different query locations can have different kNN distances and we will develop alternative pruning rules to tackle this problem. Second, we will study moving queries over incomplete spatial data objects collected from multiple sources [4]. As some data objects may not have exact locations, it is important to deduce their approximate locations and provide accuracy bounds on query results.

    • 3D TOPOLOGICAL SUPPORT in SPATIAL DATABASES: AN OVERVIEW

      2021, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
    • Answering Skyline Queries over Incomplete Data with Crowdsourcing

      2021, IEEE Transactions on Knowledge and Data Engineering
    • Processing incomplete k nearest neighbor search

      2016, IEEE Transactions on Fuzzy Systems
    View all citing articles on Scopus

    Alfredo Cuzzocrea is actually a Senior Researcher at the Institute of High Performance Computing and Networking of the Italian National Research Council, Italy, and an Adjunct Professor at the Department of Electronics, Computer Science and Systems of the University of Calabria, Italy. His research interests include multidimensional data modeling and querying, data stream modeling and querying, data warehousing and OLAP, OLAM, XML data management, Web information systems modeling and engineering, knowledge representation and management models and techniques, Grid and P2P computing. He is author or co-author of more than 130 papers in referred international conferences (including EDBT, SSDBM, ADBIS, DOLAP, SAC, COMPSAC, DEXA, DaWaK, FQAS, ISMIS, IDEAS, SEKE, and WISE) and international journals (including JCSS, DKE, JIIS, KAIS, CPE, IJDWM, WIAS, and IJBIDM). He serves as program committee member of referred international conferences (including ICDE, EDBT, SSDBM, MDM, CIKM, ER, DASFAA, ICDM, SDM, PKDD, PAKDD, ICML, ICDCS, SAC, FQAS, and WISE) and as review board member of referred international journals (including TODS, TKDE, TSMC, TSC, JCSS, IS, DKE, JIIS, KAIS, IPL, TPLP, COMPJ, DPDB, KAIS, INS, IJSEKE, and FGCS). He also serves as PC Chair in several international conferences and as Guest Editor in international journals like JCSS, IS, DKE, KAIS, FI, and IJBIDM.

    Andrea Nucita is actually Researcher at the Faculty of Computer Science at the University of Messina, Italy. He received a PhD Degree in Computer Science from the University of Milan in 2004. His research interests are related to geographical information systems, query optimization in the context of spatial databases and medical informatics.

    He has co-authored publications in international conference proceedings and international journals in the above-mentioned research fields.

    View full text