Identifying relevant constraints for semantic query optimization
Introduction
A database is required to be maintained in a valid state in which data values are restricted by certain constraints, namely, integrity constraints. These constraints, often referred to as semantic knowledge, can be functional dependencies, range restrictions, or even complicated rules that define relationships between attributes of relations. Such information can be used in enhancing the efficiency of query processing through a process known as Semantic Query Optimization (SQO). Formally, let Cl,…,Cn be integrity constraints (ICs). If the database is in a valid state, then a query Q is logically equivalent to Q∧Cl∧…∧Cn [10]. By applying transformations that preserve equivalence, different forms of the query can be generated. Among these we hope to choose the least expensive one to process.
Deductive database is a very convenient paradigm in which to analyze the interactions between the queries and ICs. The queries, relations, and ICs can all be represented in a single formalism, thus allowing full interactions. Using a relation, Teaches(Professor, Dept, Course, Lab), the query, “find professors who teach courses with programming lab”, and the integrity constraint, “only professors in the computer science department can teach courses with programming lab”, can be represented by Qa and IC1, respectively.
Qa: Teaches(P, _,_, ‘program’) /∗ P is a free variable, _ is a don't care condition∗/
IC1: DI=‘cs’←Teaches(P1, D1, C1, ‘program’) /∗ variables are universally quantified∗/
Teaches(P. _,_, ‘program’)∧∀(D1, P1, C1) (D1=‘cs’←Teaches(P1, D1, C1, ‘program’))
Teaches(P, _,_, ‘program’)∧D=‘cs’
IC2: Fdept=Sdept←Student(S#, Sname, Sdept, Advisor), Faculty(F#, Fname, Fdept),
Advisor=F#
Qb: Faculty(F#, _, ‘cs’)∨Student(_,_, ‘cs’, F#)
In this paper, we introduce a method of efficiently identifying relevant ICs for a given query at query processing time. The query may have any number of joins and unions of relations. We utilize a query graph in order to identify a set of relevant ICs for a given query. The head of a relevant IC can be used as a valid condition on the original query. Our objective is to collect, as many as possible and as efficiently as possible, such valid conditions from the set of ICs. The stage of actually reformulating the query using this information from the relevant ICs is beyond the scope of this paper (for related work, see [10], [18]).
Another type of knowledge that can be utilized in query processing is heuristics. Heuristics, when represented in Horn clauses, are similar in structure to integrity constraints. On the other hand, they differ from integrity constraints in that they represent guesses, hunches, or properties that are not verified. So, in query processing, use of such types of knowledge may result in an incomplete set of answers. Retrieving only a subset of answers from the database is often referred to as few answer retrieval (FAR). We propose vertical and horizontal partitioning techniques to support FAR. We also show how backtracking can be used to guarantee completeness in case the full set of answers for the original query is requested.
The remainder of this paper is organized as follows. In Section 2, we state our problem and define our scope. In Section 3, we introduce a way of selecting relevant ICs for a query involving joins and unions of only EDB relations. We generalize this method to queries involving IDB relations in Section 4. Issues on management and application of information obtained from relevant ICs are discussed in Section 5. We extend our discussions to heuristics in Section 6. Simulation results are presented in Section 7, and conclusions in Section 8.
Section snippets
The database
We assume the reader is familiar with terms and concepts in database query processing [9] and deductive databases [6]. The database consists of three major components; the EDB, IDB and the set of semantic knowledge clauses. We shall use the term database predicate to denote a predicate that represents a database relation, EDB or IDB. Other predicates (=, <, >, etc.) will be collectively called built-in predicates. Because complex objects will complicate unification and operations in the EDB
Queries with joins/unions of EDB relations
Let us assume that our query has a combination of joins and unions on EDB relations. If an IC contains only one database predicate in its body, it is relevant to the given query only if the predicate is in the query cluster. This is rather trivial so we will not consider ICs with only one database predicate in the body (such as I1 in Example 3). Furthermore, to simplify our discussions, we will consider only ICs with exactly two database predicates in the body. Generalization to ICs with more
Queries with IDB relations
The presence of IDB relations in a query complicates the task of selecting relevant ICs in two ways. First, it generates extensions which are logically equivalent to the query. Thus, an IC that may not seem relevant to the query itself may prove to be relevant to an extension of the query. This means the PRC selection process must consider not only the query expression itself but also its extensions, as shown in Example 1. Second, an IDB relation defined by more than one IDB rules introduces
Scope of a restriction
When the body of a remainder is satisfied, the head becomes a valid restriction. These restrictions are used in satisfying still other PRCs. These relationships are depicted in Fig. 5. The primary objective of the semantic analysis stage is to maximize the set of restrictions SR. A collected restriction can be applied only to the relevant portion of the query tree. We call this the scope of a restriction. We present in this section, how this scope can be practically managed.
Extension to few answer retrieval
An artificial intelligence application generally deals with a problem that seeks the best answer or few answers as opposed to seeking all answers, which is typical to a database application. A subset of possible answers should be found when not all answers are required and it is more efficient to do so, or only answers that satisfy certain qualifications (good, high, large, etc.) are wanted. This concept, which we refer to as few answer retrieval (FAR), has been actively pursued, although under
Simulations
A set of simulations have been run in order to demonstrate the performance characteristics of our proposal. The output measurement, which we simply call runtime (measured in milliseconds), is the average time it takes to identify the relevant constraints and collects restrictions for a query. The database contains 20 EDB predicates (base tables), 100 IDB rules, and 500 knowledge clauses (ICs and heuristics). The average number of database predicates in an IDB rule is 3 and that of knowledge
Conclusions
We have presented a way of identifying relevant knowledge clauses for a given query that may involve joins and unions on EDB and IDB relations. Our only assumption on the structure of the query is that the order of actual operations faithfully follows the structure of the given query (or equivalently, query tree). In other words, our approach is applicable to any structure of an AND/OR query without alteration. An alternative way of collecting relevant ICs for a join/union query would be to
References (22)
- et al.
Design and implementation of rock and roll: a deductive object-oriented database system
Information Systems
(1995) - et al.
Foundations of semantic query optimization for deductive databases
- et al.
Symbolic Logic and Mechanical Theorem Proving
(1973) - et al.
Programming in Prolog
(1984) - B. Freitag, O. Biernath, An Airtravel Expert Database, Proceedings of Third IEEE, International Conference on Data and...
- et al.
Logic and databases: a deductive approach
ACM Computing Surveys
(1984) - J. Han, L.J. Henschen, W. Lu, Search strategies for finding partial answers in large knowledge-bases, Proceedings of...
- M.M. Hammer, S.B. Zdonik, Knowledge based query processing. Proceedings of Sixth International Conference on VLDB, VLDB...
- et al.
Query optimization in database systems
ACM Computing Surveys
(1984) - J.J. King, Query Optimization by Semantic Reasoning, PhD Thesis, Department of CS, Stanford University, Stanford,...