Synonyms

Bx-Tree; Linearization; Moving objects; Peano curve; Range query algorithm

Definition

The Bx-tree (Jensen et al. 2004) is a query and update efficient B+-tree-based index structure for moving objects which are represented as linear functions. The Bx-tree uses a linearization technique to exploit the volatility of the data values being indexed i.e., moving-object locations. Specifically, data values are first partitioned according to their update time and then linearized within the partitions according to a space-filling curve, e.g., the Peano or Hilbert curve. The resulting values are combined with their time partition information and then indexed by a single B+-tree. Figure 1 shows an example of the Bx-tree with the number of index partitions equal to two within one maximum update interval Δ t mu . In this example, there are maximum of three partitions existing at the same time. After linearization, object locations inserted at time 0 are indexed in partition 1, object locations updated during time 0 to 0. 5Δ t mu are indexed in partition 2 and objects locations updated during time 0. 5Δ t mu to time Δ t mu are indexed in partition 3 (as indicated by arrows). As time elapses, repeatedly the first range expires (shaded area), and a new range is appended (dashed line). This use of rolling ranges enables the Bx-tree to handle time effectively.

Indexing of Moving Objects, Bx-Tree, Fig. 1
figure 1015figure 1015

An example of the Bx-tree

Historical Background

Traditional indexes for multidimensional databases, such as the R-tree (Guttman 1984) and its variants were, implicitly or explicitly, designed with the main objective of supporting efficient query processing as opposed to enabling efficient updates. This works well in applications where queries are relatively much more frequent than updates. However, applications involving the indexing of moving objects exhibit workloads characterized by heavy loads of updates in addition to frequent queries.

Several new index structures have been proposed for moving-object indexing. One may distinguish between indexing of the past positions versus indexing of the current and near-future positions of spatial objects. The Bx-tree belongs to the latter category.

Past positions of moving objects are typically approximated by polylines composed of line segments. It is possible to index line segments by R-trees, but the trajectory memberships of segments are not taken into account. In contrast to this, the spatio-temporal R-tree (Pfoser et al. 2000) attempts to also group segments according to their trajectory memberships, while also taking spatial locations into account. The trajectory-bundle tree (Pfoser et al. 2000) aims only for trajectory preservation, leaving other spatial properties aside. Another example of this category is the multi-version 3DR-tree (Tao and 2001), which combines multi-version B-trees and R-trees. Using partial persistence, multi-version B-trees guarantee time slice performance that is independent of the length of the history indexed.

The representations of the current and near-future positions of moving objects are quite different, as are the indexing challenges and solutions. Positions are represented as points (constant functions) or functions of time, typically linear functions. The Lazy Update R-tree (Kwon et al. 2002) aims to reduce update cost by handling updates of objects that do not move outside their leaf-level MBRs specially, and a generalized approach to bottom-up update in R-trees has recently been examined (Lee et al. 2003).

Tayeb et al. (1998) use PMR-quadtrees for indexing the future linear trajectories of one-dimensional moving points as line segments in (x, t)-space. The segments span the time interval that starts at the current time and extends some time into the future, after which time, a new tree must be built. Kollios et al. (1999) employ dual transformation techniques which represent the position of an object moving in a d-dimensional space as a point in a 2d-dimensional space. Their work is largely theoretical in nature. Based on a similar technique, Patel et al. (2004) have most recently developed a practical indexing method, termed STRIPES, that supports efficient updates and queries at the cost of higher space requirements. Another representative indexes are the TPR-tree (time-parameterized R-tree) family of indexes (e.g., Saltenis et al. 2000; Tao et al. 2003), which add the time parameters to bounding boxes in the traditional R-tree.

Scientific Fundamentals

Index Structure

The base structure of the Bx-tree is that of the B+-tree. Thus, the internal nodes serve as a directory. Each internal node contains a pointer to its right sibling (the pointer is non-null if one exists). The leaf nodes contain the moving-object locations being indexed and corresponding index time.

To construct the Bx-tree, the key step is to map object locations to single-dimensional values. A space-filling curve is used for this purpose. Such a curve is a continuous path which visits every point in a discrete, multi-dimensional space exactly once and never crosses itself. These curves are effective in preserving proximity, meaning that points close in multidimensional space tend to be close in the one-dimensional space obtained by the curve. Current versions of the Bx-tree use the Peano curve (or Z-curve) and the Hilbert curve. Although other curves may be used, these two are expected to be particularly good according to analytical and empirical studies in Moon et al. (2001). In what follows, the value obtained from the space-filling curve is termed as the x_value.

An object location is given by \(O = (\overrightarrow{x},\overrightarrow{v})\), a position and a velocity, and an update time, or timestamp, t u , where these values are valid. Note that the use of linear functions reduces the amount of updates to one third in comparison to constant functions. In a leaf-node entry, an object O updated at t u is represented by a value Bxvalue(O, t u ):

$$\displaystyle{ B^{x}value(O,t_{ u}) = [\mathit{index}{\_}{\mathit{partition}}]{}_{2} \oplus [x{\_}{\it \text{rep}}]{}_{2} }$$
(1)

where index_partition is an index partition determined by the update time, x_rep is obtained using a space-filling curve, [x]2 denotes the binary value of x, and ⊕ denotes concatenation.

If the timestamped object locations are indexed without differentiating them based on their timestamps, the proximity preserving property of the space-filling curve will be lost; and the index will also be ineffective in locating an object based on its x_value. To overcome such problems, the index is “partitioned” by placing entries in partitions based on their update time. More specifically, Δ t mu denotes the time duration that is the maximum duration in-between two updates of any object location. Then the time axis is partitioned into intervals of duration Δ t mu , and each such interval is sub-partitioned into n equal-length sub-intervals, termed phases. By mapping the update times in the same phase to the same so-called label timestamp and by using the label timestamps as prefixes of the representations of the object locations, index partitions are obtained, and the update times of updates determine the partitions they go to. In particular, an update with timestamp t u is assigned a label timestamp t lab = ⌈t u +Δ t mu n l , where operation ⌈x l returns the nearest future label timestamp of x. For example, Fig. 1 shows a Bx-tree with n = 2. Objects with timestamp t u = 0 obtain label timestamp t lab = 0. 5Δ t mu ; objects with 0 < t u ≤ 0. 5Δ t mu obtain label timestamp t lab = Δ t mu ; and so on. Next, for an object with label timestamp t lab , its position at t lab is computed according to its position and velocity at t u . Then the space-filling curve is applied to this (future) position to obtain the second component of Eq. 1.

This mapping has two main advantages. First, it enables the tree to index object positions valid at different times, overcoming the limitation of the B+-tree, which is only able to index a snapshot of all positions at the same time. Second, it reduces the update frequency compared to having to update the positions of all objects at each timestamp when only some of them need to be updated. The two components of the mapping function in Eq. 1 are consequently defined as follows:

$$\displaystyle\begin{array}{rcl} & & \mathit{index}{\_}{\mathit{partition}}\, =\, (t_{lab}/(\varDelta t_{mu}/n) - 1)\, {}\\ & & \qquad \qquad \qquad \qquad \quad mod(n\, +\, 1) {}\\ & & x{\_}{\mathit{rep}}\, =\, x{\_}{\mathit{value}}(\overrightarrow{x}\, +\,\overrightarrow{ v} \cdot (t_{lab} - t_{u})) {}\\ \end{array}$$

With the transformation, the Bx-tree will contain data belonging to n + 1 phases, each given by a label timestamp and corresponding to a time interval. The value of n needs to be carefully chosen since it affects query performance and storage space. A large n results in smaller enlargements of query windows (covered in the following subsection), but also results in more partitions and therefore a looser relationship among object locations. In addition, a large n yields a higher space overhead due to more internal nodes.

To exemplify, let n = 2, Δ t mu = 120, and assume a Peano curve of order 3 (i.e., the space domain is 8 × 8). Object positions O1 = ((7, 2), (−0. 1, 0. 05)), O2 = ((0, 6), (0. 2, −0. 3)), and O3 = ((1, 2), (0. 1, 0. 1)) are inserted at times 0, 10, and 100, respectively. The Bxvalue for each object is calculated as follows.

  • Step 1: Calculate label timestamps and index partitions.

    $$\displaystyle\begin{array}{rcl} t_{lab}^{1}\,& =& \,\lceil (0 + 120/2)\rceil _{ l}\, =\, 60, {}\\ & & \mathit{index}{\_}{\mathit{partition}}^{1}\, =\, 0\, =\, (00)_{ 2} {}\\ t_{lab}^{2}\,& =& \,\lceil (10 + 120/2)\rceil _{ l}\, =\, 120, {}\\ & & \mathit{index}{\_}{\mathit{partition}}^{2}\, =\, 1\, =\, (01)_{ 2} {}\\ t_{lab}^{3}\,& =& \,\lceil (100 + 120/2)\rceil _{ l}\, =\, 180, {}\\ & & \mathit{index}{\_}{\mathit{partition}}^{3}\, =\, 2\, =\, (10)_{ 2} {}\\ \end{array}$$
  • Step 2: Calculate positions x1, x2, and x3 at t lab 1, t lab 2, and t lab 3, respectively.

    $$\displaystyle\begin{array}{rcl} x_{1}^{{\prime}}\,& =& \,(1,5) {}\\ x_{2}^{{\prime}}\,& =& \,(2,3) {}\\ x_{3}^{{\prime}}\,& =& \,(4,1) {}\\ \end{array}$$
  • Step 3: Calculate Z-values.

    $$\displaystyle\begin{array}{rcl} [Z{\_}{\mathit{value}}(x_{1}^{{\prime}})]{}_{ 2}\, & =& \, (010011)_{2} {}\\ {} [Z{\_}{\mathit{value}}(x_{2}^{{\prime}})]_{ 2}\, & =& \,(001101)_{2} {}\\{} [Z{\_}{\mathit{value}}(x_{3}^{{\prime}})]{}_{ 2}\, & =& \, (100001)_{2} {}\\ \end{array}$$

Range Query Algorithm

A range query retrieves all objects whose location falls within the rectangular range q = ([qx1l, qx1u], [qx2l, qx2u]) at time t q not prior to the current time (“l” denotes lower bound, and “u” denotes upper bound).

A key challenge is to support predictive queries, i.e., queries that concern future times. Traditionally, indexes that use linear functions handle predictive queries by means of bounding box enlargement (e.g., the TPR-tree). Whereas, the Bx-tree uses query-window enlargement. Since the Bx-tree stores an object’s location as of some time after its update time, the enlargement involves two cases: a location must either be brought back to an earlier time or forward to a later time. Consider the example in Fig. 2, where t ref denotes the time when the locations of four moving objects are updated to their current value index, and where predictive queries q1 and q2 (solid rectangles) have time parameters tq1 and tq2, respectively.

Indexing of Moving Objects, Bx-Tree, Fig. 2
figure 1016figure 1016

Query window enlargement

The figure shows the stored positions as solid dots and positions of the two first objects at tq1 and the positions of the two last at tq2 as circles. The two positions for each object are connected by an arrow. The relationship between the two positions for each object is \(p_{i}^{{\prime}}\, =\, p_{i} +\overrightarrow{ v} \cdot (t_{q} - t_{{\it \text{ref}}})\). The first two of the four objects, thus, are in the result of the first query, and the last two objects are in the result of the second query. To obtain this result, query rectangle q1 needs to be enlarged to q1 (dashed). This is achieved by attaching maximum speeds to the sides of q1 : v1l, v2l, v1u, and v2u. For example, v1u is obtained as the largest projection onto the x-axis of a velocity of an object in q1. For q2, the enlargement speeds are computed similarly. For example, v2u is obtained by projecting all velocities of objects in q2 onto the y-axis; v2u is then set to the largest speed multiplied by − 1.

The enlargement of query q = ([qx1l, qx1u], [qx2l, qx2u]) is given by query q = ([eqx1l, eqx1u], [eqx2l, eqx2u]):

$$\displaystyle{ eqx_{i}^{l}\, = \left \{\begin{array}{ll} qx_{i}^{l} + v_{ i}^{l} \cdot (t_{\mathit{ ref }} - t_{q}) &\text{if}\;t_{q} < t_{\mathit{ref }} \\ qx_{i}^{l} + v_{i}^{u} \cdot (t_{q} - t_{\mathit{ref }})&\text{otherwise}\\ \end{array} \right. }$$
(2)
$$\displaystyle{ eqx_{i}^{u}\, = \left \{\begin{array}{ll} qx_{i}^{u}\, +\, v_{ i}^{u} \cdot (t_{\mathit{ ref }}\, -\, t_{q})&\text{if}\;t_{q}\, <\, t_{\mathit{ref }} \\ qx_{i}^{u}\, +\, v_{i}^{l} \cdot (t_{q}\, -\, t_{\mathit{ref }}) &\text{otherwise}\\ \end{array} \right. }$$
(3)

The implementation of the computation of enlargement speeds proceeds in two steps. They are first set to the maximum speeds of all objects, thus a preliminary q is obtained. Then, with the aid of a two-dimensional histogram (e.g., a grid) that captures the maximum and minimum projections of velocities onto the axes of objects in each cell, the final enlargement speed in the area where the query window resides is obtained. Such a histogram can easily be maintained in main memory.

Next, the partitions of the Bx-tree are traversed to find objects falling in the enlarged query window q. In each partition, the use of a space-filling curve means that a range query in the native, two-dimensional space becomes a set of range queries in the transformed, one-dimensional space (see Fig. 3); hence multiple traversals of the index result. These traversals are optimized by calculating the start and end points of the one-dimensional ranges and traverse the intervals by “jumping” in the index.

Indexing of Moving Objects, Bx-Tree, Fig. 3
figure 1017figure 1017

Jump in the index

k Nearest Neighbor Query Algorithm

Assuming a set of N > k objects and given a query object with position q = (qx1, qx2), the k nearest neighbor query (kNN query) retrieves k objects for which no other objects are nearer to the query object at time t q not prior to the current time.

This query is computed by iteratively performing range queries with an incrementally enlarged search region until k answers are obtained. First, a range Rq1 centered at q with extension r q = D k k is constructed. D k is the estimated distance between the query object and its kth nearest neighbor; D k can be estimated by the following equation (Tao et al. 2004):

$$\displaystyle{ D_{k} = \frac{2} {\sqrt{\pi }}\left [1 -\sqrt{1 - \left ( \frac{k} {N}\right )^{1/2}}\right ]\;. }$$

The range query with range Rq1 at time t q is computed, by enlarging it to a range Rq1 and proceeding as described in the previous section. If at least k objects are currently covered by Rq1 and are enclosed in the inscribed circle of Rq1 at time t q , the kNN algorithm returns the k nearest objects and then stops. It is safe to stop because all the objects that can possibly be in the result have been considered. Otherwise, Rq1 is extended by r q to obtain Rq2 and an enlarged window Rq2. This time, the region Rq2Rq1 is searched and the neighbor list is adjusted accordingly. This process is repeated until we obtain an R qi so that there are k objects within its inscribed circle.

Continuous Query Algorithm

The queries considered so far in this section may be considered as one-time queries: they run once and complete when a result has been returned. Intuitively, a continuous query is a one-time query that is run at each point in time during a time interval. Further, a continuous query takes a now-relative time now +Δ t q as a parameter instead of the fixed time t q . The query then maintains the result of the corresponding one-time query at time now +Δ t q from when the query is issued at time tissue and until it is deactivated.

Such a query can be supported by a query q e with time interval [tissue + Δ t q , tissue + Δ t q + l] (“l” is a time interval) (Benetis et al. 2002). Query q e can be computed by the algorithms presented previously, with relatively minor modifications: (i) use the end time of the time interval to perform forward enlargements, and use the start time of the time interval for backward enlargements; (ii) store the answer sets during the time interval. Then, from time tissue to tissue + l, the answer to q l is maintained during update operations. At tissue + l, a new query with time interval [tissue + Δ t q + l, tissue + Δ t q + 2l] is computed.

A continuous range query during updates can be maintained by adding or removing the object from the answer set if the inserted or deleted object resides in the query window. Such operations only introduce CPU cost.

The maintenance of continuous kNN queries is somewhat more complex. Insertions also only introduce CPU cost: an inserted object is compared with the current answer set. Deletions of objects not in the answer set does not affect the query. However, if a deleted object is in the current answer set, the answer set is no longer valid. In this case, a new query with a time interval of length l at the time of the deletion is issued. If the deletion time is t del , a query with time interval [t del + Δ t q , t del + Δ t q + l] is triggered at t del , and the answer set is maintained from t del to t del + l.

The choice of the “optimal” l value involves a trade-off between the cost of the computation of the query with the time interval and the cost of maintaining its result. On the one hand, a small l needs to be avoided as this entails frequent recomputations of queries, which involve a substantial I/O cost. On the other hand, a large l introduces a substantial cost: Although computing one or a few queries is cost effective in itself, the cost of maintaining the larger answer set must also be taken into account, which may generate additional I/Os on each update. Note that maintenance of continuous range queries incur only CPU cost. Thus, a range query with a relatively large l is computed such that l is bounded by Δ t mu Δ t q since the answer set obtained at tissue is no longer valid at tissue +Δ t mu . For the continuous kNN queries, l needs to be carefully chosen.

Update, Insertion, and Deletion

Given a new object, its index key is calculated according to Eq. 1, and then insert it into the Bx-tree as in the B+-tree. To delete an object, an assumption is made that the positional information for the object used at its last insertion and the last insertion time are known. Then its index key is calculated and the same deletion algorithm as in the B+-tree is employed. Therefore, the Bx-tree directly inherits the good properties of the B+-tree, and efficient update performance is expected.

However, one should note that update in the Bx-tree does differ with respect to update in the B+-tree. The Bx-tree only updates objects when their moving functions have been changed. This is realized by clustering updates during a certain period to one time point and maintaining several corresponding sub-trees. The total size of the three sub-trees is equal to that of one tree indexing all the objects.

In some applications, there may be some object positions that are updated relatively rarely. For example, most objects may be updated at least each 10 min, but a few objects are updated once a day. Instead of letting outliers force a large maximum update interval, a “maximum update interval” within which a high percentage of objects have been updated is used. Object positions that are not updated within this interval are “flushed” to a new partition using their positions at the label timestamp of the new partition. In the example shown in Fig. 4, suppose that some object positions in T0 are not updated at the time when T0 expires. At this time, these objects are moved to T2. Although this introduces additional update cost, the (controllable) amortized cost is expected to be very small since outliers are rare. The forced movement of an object’s position to a new partition does not cause any problem with respect to locating the object, since the new partition can be calculated based on the original update time. Likewise, the query efficiency is not affected.

Indexing of Moving Objects, Bx-Tree, Fig. 4
figure 1018figure 1018

Bx-tree evolution

Key Applications

With the advances in positioning technologies, such as GPS, and rapid developments of wireless communication devices, it is now possible to track continuously moving objects, such as vehicles, users of wireless devices and goods. The Bx-tree can be used in a number of emerging applications involving the monitoring and querying of large quantities of continuous variables, e.g., the positions of moving objects. In the following, some of these applications are discussed.

Location-Based Service

A traveller comes to a city that he is not familiar with. To start with, he sends his location by using his PDA or smart phone (equipped with GPS) to a local server that provides location-based services. Then the service provider can answer queries like “where is the nearest restaurant (or hotel)?” and can also help to dispatch a nearby taxi to the traveller.

A driver can also benefit from the location-based services. For example, he can ask for a nearest gas station or motel when he is driving.

Traffic Control

If the moving objects database stores information about locations of vehicles, it may be able to predict the possible congestion in near future. To avoid the congestion, the system can divert some vehicles to alternate routes in advance.

For air traffic control, the moving objects database system can retrieve all the aircrafts within a certain region and prevent a possible collision.

E-commerce

In these applications, stores send out advertisements or e-coupons to vehicles passing by or within the store region.

Digital Game

Another interesting example is location-based digital game where the positions of the mobile users play a central role. In such games, players need to locate their nearest neighbors to fulfill “tasks” such as “shooting” other close players via their mobile devices.

Battle Field

The moving object database technique is also very important in the military. With the help of the moving object database techniques, helicopters and tanks in the battlefield may be better positioned and mobilized to the maximum advantage.

RFID Application

Recently, applications using radio frequency identification (RFID) has received much interest. RFID enables data to be captured remotely via radio waves and stored on electronic tags embedded in their carriers. A reader (scanner) is then used to retrieve the information. In a hospital application, RFIDs are tagged to all patients, nurses and doctors, so that the system can keep a real-time tracking of their movements. If there is an emergency, nurses and doctors can be sent to the patients more quickly.

Future Directions

Several promising directions for future work exist. One could be the improvement of the range query performance in the Bx-tree since the current range query algorithm uses the strategy of enlarging query windows which may incur some redundant search. Also, the use of the Bx-tree for the processing of new kinds of queries can be considered. Another direction is the use of the Bx-tree for other continuous variables than the positions of mobile service users. Yet another direction is to apply the linearization technique to other index structures.

Cross-References