Temporal Joins

Gao, Dengfeng

doi:10.1007/978-0-387-39940-9_401

Dengfeng Gao³

255 Accesses

Definition

A temporal join is a join operation on two temporal relations, in which each tuple has additional attributes indicating a time interval. The temporal join predicates include conventional join predicates as well as a temporal constraint that requires the overlap of the intervals of the two joined tuples. The result of a temporal join is a temporal relation.

Besides binary temporal joins that operate on two temporal relations, there are n-ary temporal joins that operate on more than two temporal relations. Besides temporal overlapping, there are other temporal conditions such as “before” and “after” [1]. This entry will concentrate on the binary temporal joins with overlapping temporal condition since most of the previous work has focused on this kind of joins.

Historical Background

In the past, temporal join operators have been defined in different temporal data models; at times the essentially same operators have even been given different names when defined in different data models. Further, the existing join algorithms have also been constructed within the contexts of different data models. Temporal join operators were first defined by Clifford and Croker [4]. Later many papers studied more temporal join operators and the evaluation algorithms. To enable the comparison of join definitions and implementations across data models, Gao et al. [7] proposed a taxonomy of temporal joins and then use this taxonomy to classify all previously defined temporal joins.

Foundations

Starting from the core set of conventional relational joins that have long been accepted as “standard” [11]: Cartesian product (whose “join predicate” is the constant expression TRUE), theta-join, equijoin, natural join, left and right outerjoin, and full outerjoin, a temporal counterpart that is a natural, temporal generalization of the set can be defined. The semantics of the temporal join operators are defined as follows.

To be specific, the definitions are based on a single data model that is used most widely in temporal data management implementations, namely the one that timestamps each tuple with an interval. Assume that the time-line is partitioned into minimal-duration intervals, termed chronons [5]. The intervals are denoted by inclusive starting and ending chronons.

Two temporal relational schemas, R and S, are defined as follows.

$$\eqalign{R = ({A}_{1},{ ...} ,{A}_{n},{{\rm T}}_{s},{{\rm T}}_{e}) \cr S = ({B}_{1},{ ...} ,{B}_{m},{{\rm T}}_{s},{{\rm T}}_{e})} $$

The A _i, 1 ≤ i ≤ n, and B _i, 1 ≤ i ≤ m, are the explicit attributes that are found in corresponding snapshot schemas, and T_s and T_e are the timestamp start and end attributes, recording when the information recorded by the explicit attributes holds (or held or will hold) true. T will be used as a shorthand for the interval [T_s, T_e], and A and B will be used as a shorthand for {A ₁,...,A _n} and {B ₁,...,B _n}, respectively. Also, r and s are defined to be instances of R and S, respectively.

Table 1 Employee

Full size table

Table 2 Manages

Full size table

Table 3 Employee × ^T Manages

Full size table

Consider the following two temporal relations. The relations show the canonical example of employees, the departments they work for, and the managers who supervise those departments.

Tuples in the relations represent facts about the modeled reality. For example, the first tuple in the Employee relation represents the fact that Ron worked for the Shipping department from time 1 to time 5, inclusive. Notice that none of the attributes, including the timestamp attributes T, are set-valued – the relation schemas are in 1NF.

Cartesian Product

The temporal Cartesian product is a conventional Cartesian product with a predicate on the timestamp attributes. To define it, two auxiliary definitions are needed.

First, intersect (U,V), where U and V are intervals, returns TRUE if there exists a chronon t such that t ∈ U ∧ t ∈ V , and FALSE otherwise. Second, overlap (U,V) returns the maximum interval contained in its two argument intervals. If no non-empty intervals exist, the function returns. To state this more precisely, let first and last return the smallest and largest of two argument chronons, respectively. Also let U _s and U _e denote the starting and ending chronons of U, and similarly for V.

$${overlap\left(U, V\right) = \left\{\matrix{\left[{last\left(U_s, V_s\right), first\left(U_e, V_e\right)}\right]\,{\rm{if}}\,last\,\left(U_s, V_s\right) \cr \qquad \qquad \qquad \qquad \le first\left(U_e ,V_e\right) \cr \theta \qquad \qquad \qquad \quad \qquad \qquad \qquad {\rm{otherwise}}.}\right.}$$

The temporal Cartesian product, r × ^T s, of two temporal relations r and s is defined as follows.

r × ^T s ={z ^(n+m+2)|∃x ∈ r ∃y ∈ s (intersect (x[T], y[T])∧z[A] = x[A]∧z[B] = y[B]∧

z[T] = overlap(x[T],y[T]) ∧ z[T]≠φ)}

The first line of the definition ensures that matching tuples x and y have overlapping timestamps and sets the explicit attribute values of the result tuple z to the concatenation of the explicit attribute values of x and y. The second line computes the timestamp of z and ensures that it is non-empty. The intersect predicate is included only for later reference – it may be omitted without changing the meaning of the definition.

Consider the query “Show the names of employees and managers where the employee worked for the company while the manager managed some department in the company.” This can be satisfied using the temporal Cartesian product.

The overlap function is necessary and sufficient to ensure snapshot reducibility, as will be discussed in detail later. Basically, the temporal Cartesian product acts as though it is a conventional Cartesian product applied independently at each point in time. When operating on interval-stamped data, this semantics corresponds to an intersection: the result will be valid during those times when contributing tuples from both input relations are valid.

Theta-Join

Like the conventional theta-join, the temporal theta-join supports an unrestricted predicate P on the explicit attributes of its input arguments. The temporal theta-join, r ⋈_P ^T s, of two relations r and s selects those tuples from r ×^T s that satisfy predicate P(r[A],s[B]). Let σ denote the standard selection operator.

The temporal theta-join, r ⋈_P ^T s, of two temporal relations r and s is defined as follows.

$$r\ {\rm { \bowtie }_{ P}^{{\rm T}}}s = {\sigma }_{ P(r[A],s[B])}(r{\rm {\times }^{{\rm T}}}s)$$

Equijoin

Like snapshot equijoin, the temporal equijoin operator enforces equality matching between specified subsets of the explicit attributes of the input relations.

The temporal equijoin on two temporal relations r and s on attributes A′⊆ A and B′⊆ B is defined as the theta-join with predicate P ≡ r[A′] = s[B′]:

$$r\ { \bowtie }_{ r[A']=s[B']}^{{\rm T} }s\ .$$

Natural Join

The temporal natural join bears the same relationship to the temporal equijoin as does their snapshot counterparts. Namely, the temporal natural join is simply a temporal equijoin on identically named explicit attributes, followed by a subsequent projection operation.

To define this join, the relation schemas are augmented with explicit join attributes, C _i, 1 ≤ i ≤ k, which are abbreviated by C.

$$\eqalign{R &=({A}_{1}, { ...} , {A}_{n},{C}_{1}, { ...} , {C}_{k},{{\rm T}}_{s},{{\rm T}}_{e}) \cr S &= ({B}_{1},{ ...}, {B}_{m},{C}_{1},{ ...} , {C}_{k},{{\rm T}}_{s},{{\rm T}}_{e})}$$

The temporal natural join of r and s, r ⋈^T s, is defined as follows.

r ⋈ ^T s ={z ^(n+m+k+2)|∃x ∈ r∃y ∈ s(x[C] = y[C]∧

z[A] = x[A] ∧ z[B] = x[B] ∧ z[C] = y[C]∧

z[T] = overlap(x[T],y[T]) ∧ z[T]≠φ)}

The first two lines ensure that tuples x and y agree on the values of the join attributes C and set the explicit attribute of the result tuple z to the concatenation of the non-join attributes A and B and a single copy of the join attributes, C. The third line computes the timestamp of z as the overlap of the timestamps of x and y, and ensures that x[T] and y[T] actually overlap.

The temporal natural join plays the same important role in reconstructing normalized temporal relations as does the snapshot natural join for normalized snapshot relations [10]. Most previous work in temporal join evaluation has addressed, either implicitly or explicitly, the implementation of the temporal natural join (or the closely related temporal equijoin).

Outerjoins and Outer Cartesian Products

Like the snapshot outerjoin, temporal outerjoins and Cartesian products retain dangling tuples, i.e., tuples that do not participate in the join. However, in a temporal database, a tuple may dangle over a portion of its time interval and be covered over others; this situation must be accounted for in a temporal outerjoin or Cartesian product.

The temporal outerjoin may be defined as the union of two subjoins, analogous to the snapshot outerjoin. The two subjoins are the temporal left outerjoin and the temporal right outerjoin. As the left and right outerjoins are symmetric, only the left outerjoin is defined here.

Two auxiliary functions are needed. The coalesce function collapses value-equivalent tuples – tuples with mutually equal non-timestamp attribute values [9] – in a temporal relation into a single tuple with the same non-timestamp attribute values and a timestamp that is the finite union of intervals that precisely contains the chronons in the timestamps of the value-equivalent tuples. *(Finite unions of time intervals are termed temporal elements [6].)* The definition of coalesce uses the function chronons that returns the set of chronons contained in the argument interval.

coalesce(r) ={z ⁽ⁿ⁺²⁾|∃x ∈ r(z[A] = x[A] ⇒ chronons(x[ T]) ⊆ z[ T]∧

∀x″∈ r (x[A] = x″[A] ⇒ (chronons(x″[ T]) ⊆ z[T])))∧

∀t ∈ z[ T]∃x″ ∈ r(z[A] = x″[A] ∧ t ∈ chronons (x″[T]))}

The first two lines of the definition coalesce all value-equivalent tuples in relation r. The third line ensures that no spurious chronons are generated.

Now a function expand is defined that returns the set of maximal intervals contained in an argument temporal element, T. Prior to defining expand an auxiliary function intervals is defined that returns the set of intervals contained in an argument temporal element.

intervals(T) ={[t _s,t _e]|t _s ∈ T ∧ t _e ∈ T∧

∀t ∈ chronons([t _s,t _e])(t ∈ T)}

The first two conditions ensures that the beginning and ending chronons of the interval are elements of T. The third condition ensures that the interval is contiguous within T.

Using intervals, expand is defined as follows.

expand(T) ={[t _s,t _e]|[t _s,t _e] ∈ intervals(T)∧

¬∃[t _s′,t _e′] ∈ intervals(T)(chronons([t _s,t _e]) ⊂ chronons([t _s′,t _e′]))}

The first line ensures that a member of the result is an interval contained in T. The second line ensures that the interval is indeed maximal.

The temporal left outerjoin is now ready to be defined. Let R and S be defined as for the temporal equijoin. A′⊆ A and B′⊆ B are used as the explicit join attributes.

The temporal left outerjoin, r _{r[A′]=s[B′]} s of two temporal relations r and s is defined as follows.

r _{r[A′]=s[B′]} s ={z ^(n+m+2)|∃x ∈ coalesce(r)∃y ∈ coalesce(s)

(x[A′] = y[B′] ∧ z[A] = x[A] ∧ z[T]≠ φ∧

((z[B] = y[B] ∧ z[T] ∈ {expand(x[T] ∩ y[T])})∨

(z[B] = null ∧ z[T] ∈ {expand(x[T]) − expand(y[T])})))∨

∃x ∈ coalesce(r)∀y ∈ coalesce(s)

(x[A′]≠ y[B′] ⇒ z[A] = x[A] ∧ z[B] = null∧

z[T] ∈ expand(x[T]) ∧ z[T]≠ φ)}

The first four lines of the definition handle the case where, for a tuple x deriving from the left argument, a tuple y with matching explicit join attribute values is found. For those time intervals of x that are not shared with y, tuples with null values in the attributes of y are generated. The final three lines of the definition handle the case where no matching tuple y is found. Tuples with null values in the attributes of y are generated.

The temporal outerjoin may be defined as simply the union of the temporal left and the temporal right outerjoins (the union operator eliminates the duplicate equijoin tuples). Similarly, a temporal outer Cartesian product is a temporal outerjoin without the equijoin condition (A′ = B′ = φ).

Table 1 summarizes how previous work is represented in the taxonomy. For each operator defined in previous work, the table lists the defining publication, researchers, the corresponding taxonomy operator, and any restrictions assumed by the original operators.

Temporal Joins. Table 1 Temporal join operators

Full size table

In early work, Clifford [4] indicated that an INTERSECTION-JOIN should be defined that represents the categorized non-outer joins and Cartesian products, and he proposed that an UNION-JOIN be defined for the outer variants.

Reducibility

The following shows how the temporal operators reduce to snapshot operators. Reducibility guarantees that the semantics of snapshot operator is preserved in its more complex, temporal counterpart.

For example, the semantics of the temporal natural join reduces to the semantics of the snapshot natural join in that the result of first joining two temporal relations and then transforming the result to a snapshot relation yields a result that is the same as that obtained by first transforming the arguments to snapshot relations and then joining the snapshot relations. This commutativity diagram is shown in Fig. 1 and stated formally in the first equality of the following theorem.

figure 1_401 — **Temporal Joins. Figure 1**

The timeslice operation τ ^T takes a temporal relation r as argument and a chronon t as parameter. It returns the corresponding snapshot relation, i.e., with the schema of r, but without the timestamp attributes, that contains (the non-timestamp portion of) all tuples x from r for which t belongs to x[T]. It follows from the next theorem that the temporal joins defined here reduce to their snapshot counterparts.

Theorem 1

Let t denote a chronon and let r and s be relation instances of the proper types for the operators they are applied to. Then the following hold for all t:

Due to the space limit, the proof of this theorem is not provided here. The details can be found in the related paper [7].

Evaluation Algorithms

Algorithms for temporal join evaluation are necessarily more complex than their snapshot counterparts. Whereas snapshot evaluation algorithms match input tuples on their explicit join attributes, temporal join evaluation algorithms typically must in addition ensure that temporal restrictions are met. Furthermore, this problem is exacerbated in two ways. Timestamps are typically complex data types, e.g., intervals, requiring inequality predicates, which conventional query processors are not optimized to handle. Also, a temporal database is usually larger than a corresponding snapshot database due to the versioning of tuples.

There are two categories of evaluation algorithms. Index-based algorithms use an auxiliary access path, i.e., a data structure that identifies tuples or their locations using a join-attribute value. Non-index-based algorithms do not employ auxiliary access paths. The large number of temporal indexes have been proposed in the literature [12]. Gao et al. [7] provided a taxonomy of non-index-based temporal join algorithms.

Key Applications

Temporal joins are used to model relationships between temporal relations with respect to the temporal dimensions. Data warehouses usually need to store and analyze historical data. Temporal joins can be used (alone or together with other temporal relational operators) to perform the analysis on historical data.

Cross-references

Temporal Query Processing

Author information

Authors and Affiliations

IBM Silicon Valley Lab, San Jose, CA, USA
Dengfeng Gao

Authors

Dengfeng Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 266 Ferst Drive, 30332-0765, Atlanta, GA, USA
LING LIU (Professor) (Professor)
Database Research Group David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada
M. TAMER ÖZSU (Professor and Director, University Research Chair) (Professor and Director, University Research Chair)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Gao, D. (2009). Temporal Joins. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_401

Download citation

DOI: https://doi.org/10.1007/978-0-387-39940-9_401
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Temporal Joins

Definition

Historical Background