Elsevier

Information Systems

Volume 31, Issue 6, September 2006, Pages 489-511
Information Systems

A new range query algorithm for Universal B-trees

https://doi.org/10.1016/j.is.2004.12.001Get rights and content

Abstract

In multi-dimensional databases the essential tool for accessing data is the range query (or window query). In this paper we introduce a new algorithm of processing range query in universal B-tree (UB-tree), which is an index structure for searching in multi-dimensional databases. The new range query algorithm (called the DRU algorithm) works efficiently, even for processing high-dimensional databases. In particular, using the DRU algorithm many of the UB-tree inner nodes need not to be accessed. We explain the DRU algorithm using a simple geometric model, providing a clear insight into the problem. More specifically, the model exploits an interesting relation between the Z-curve and generalized quad-trees. We also present experimental results for the DRU algorithm implementation.

Introduction

In the area of database systems, the emergence of new database forms requires a development of appropriate access methods. Unlike single-dimensional databases, which are indexed/searched according to a simple key (e.g. using B-trees), we often intend to access data according to a composite key (according to several attributes generally). We call such databases multi-dimensional, since data instance is represented by a vector of simple values. A collection of data vectors (data tuples) can be interpreted as set of points in a multi-dimensional vector space.

Let us specify several necessary notations needed for further discussion:

Definition 1 vector space

A discrete vector space Ω is defined as the cartesian product of finite domains Di, i.e. Ω=D1×D2××Dn. The vector space Ω has n dimensions, while each particular domain Di is associated with the ith dimension of the space. Each point (in the universe Ω) or data tuple (in the database) is represented by a vector o=[o1,o2,,on], oiDi.

For the sake of simplicity, we assume that the vector space Ω is a hyper-cube determined as the nth power of a single domain D, i.e. Ω=Dn, where D is a linearly ordered interval of integers D=0,2p-1. The cardinality of D is |D|=2p for some integer p.

One of the most popular queries required for access to multi-dimensional databases is the range query (also called window query or rectangular query), by which the user specifies an interval of values ai,bi (for each attribute Ai), which the retrieved data tuples have to match. The range query can be represented by a hyper-box QB in the space Ω. The ranges of query box QB are defined by two boundary points, the lower bound QBlow=[a1,a2,,an] and the upper bound QBup=[b1,b2,,bn], where a1b1,a2b2,,anbn. The purpose of range query is to select all data tuples inside the query box QB, i.e. to select all such tuples o satisfying aioibi, for 1in (see Fig. 1).

However, for range queries the classic indexing methods, maintaining n single-dimensional indices, are inefficient. For that reason, a class of indexing methods has been developed, called spatial access methods1 (SAMs), allowing to efficiently index and query multi-dimensional data.

In this paper we introduce a new range query algorithm for the universal B-tree (UB-tree), which is a spatial access method based on the B+-tree and the Z-ordering. With the new algorithm (called the DRU algorithm), we address two issues. First, the existing algorithms are either inefficient or vaguely described. Our approach, on the other side, is deeply described in the geometric as well as in the algorithmic way. Second, the DRU algorithm works more efficiently in high-dimensional indices.

The paper is organized as follows. In Section 2 we overview the basic concepts of UB-tree. The problem of range query processing and some related work is discussed in Section 3. The description of the DRU algorithm is presented in Section 4, while the geometric principles behind the algorithm are explained in Section 5. In Sections 6, 7 the experimental results are analysed and concluded.

Section snippets

UB-tree

In Bayer [4], the Universal B-tree2 has been introduced as a data structure for indexing multi-dimensional databases. In simple words, the UB-tree can be characterized as a combination of the well-known B+-tree with the Z-ordering. Using the Z-ordering, each multi-dimensional data tuple is transformed into an integer (called

Related work

Although there has been already proposed a range query algorithm together with the introduction of UB-tree, in the following subsection we begin the discussion with an earlier and more general approach.

The down-right-up algorithm

In our approach, we have focused on the basic straightforward idea that range query must search only such leaves, the Z-regions of which intersect the query box. This can be performed via a single UB-tree downward traversal. The UB-tree is traversed in LIFO (last-in-first-out) fashion, while each visited node is examined whether the Z-regions of its child nodes intersect the query box. Only the intersected nodes are further processed. At the leaf level, all data tuples located inside the query

Z-region intersection

In this section we discuss some properties of the Z-curve, which will help us in understanding the shape of Z-region. Using such information, we can design an algorithm for testing intersection between Z-region and query box.

Experimental results

We made a set of experiments8 with synthetic datasets of increasing dimensionality. The data tuples were generated into uniformly distributed clusters of a fixed radius (using the L2 metric, see Fig. 18 for the 2D case) and indexed using the UB-tree. The number of tuples was increasing with

Conclusions and outlook

In this paper we have proposed a new algorithm (called DRU algorithm) for range query processing in the Universal B-tree (UB-tree). The DRU algorithm utilizes an operation detecting intersection between Z-region and query box, which is used for a more efficient query processing. The Z-region intersection operation is of linear time complexity according to the Z-address bit-length.

The experimental results have shown that DRU algorithm makes the UB-tree suitable for efficient search in

Acknowledgements

This research has been partially supported by grant No. GAČR 201/03/0912 of the Grant Agency of the Czech Republic.

References (19)

  • V. Gaede et al.

    Multidimensional access methods

    ACM Comput. Surv.

    (1998)
  • C. Böhm et al.

    Searching in high-dimensional spacesindex structures for improving the performance of multimedia databases

    ACM Comput. Surv.

    (2001)
  • R. Fenk

    The BUB-tree

  • R. Bayer

    The universal B-tree for multidimensional indexinggeneral concepts

  • J.A. Orenstein et al.

    A class of data structures for associative searching

  • C. Faloutsos

    Gray codes for partial match and range queries

    IEEE Trans. Softw. Eng.

    (1988)
  • C. Faloutsos et al.

    Fractals for secondary key retrieval

  • J. Orenstein

    A comparison of spatial query processing techniques for native and parameter spaces

  • H. Sagan

    Space-Filling Curves

    (1994)
There are more references available in the full text version of this article.

Cited by (26)

  • Learned index for spatial queries

    2019, Proceedings - IEEE International Conference on Mobile Data Management
  • Dictionary compression in point cloud data management

    2019, ACM Transactions on Spatial Algorithms and Systems
View all citing articles on Scopus

Recommended by Patrick O’Neil, Area Editor.

View full text