Skip to main content
Log in

An algebraic transformation framework for multidatabase queries

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Ceri and G. Pelagatti,Distributed Databases: Principles and Systems. McGraw-Hill, 1984.

  2. A.L.P. Chen, “Outerjoin optimization in multidatabase,”Proceedings of Databases in Parallel and Distributed Systems, pp. 211–217, 1990.

  3. C.J. Date, “The outer join,” inProc. Second International Conference on Databases, 1983.

  4. U. Dayal, “Processing queries over generalized hierarchies in a multidatabase systems,”Proc. of the 9th VLDB Conf., 1983.

  5. U. Dayal,Query Processing in Multidatabase Systems, Springer-Verlag, pp. 81–108, 1985.

  6. S.M. Deen, R.R. Amin, and M.C. Taylor, “Data integration in distributed databases,”IEEE Trans. on Software Engineering, SE-13(7):860–864, July 1987.

    Google Scholar 

  7. B.C. Desai and R. Pollock, “Mdas: Multiple schema integration approach,”Data Engineering Bulletin, pp. 16–21, June 1990.

  8. A.K. Elmagarmid and M. Rusinkiewicz, “Critical issues in multidatabase systems,”Information Sciences, (57–58):403–424, 1991.

  9. R. Elmasri and S.B. Navathe,Fundamentals of Database Systems. Benjamin/Cummings Publishing Company, Inc., 1989.

  10. C. Galindo-Legaria and A. Rosenthal, “How to extend a conventional optimizer to handle one- and two-sided outerjoin,”Proceedings of the 8th Int'l Conf. on Data Engineering, pp. 402–409, 1992.

  11. E.-P. Lim, J. Srivastava, S. Prabhakar, and J. Richardson, “Entity identification problem in database integration,”9th International Conference on Data Engineering, 1993.

  12. A. Motro, “Superviews: Virtual integration of multiple databases,”IEEE Trans. Software Eng., SE-13(7), July 1987.

  13. M. Rusinkiewicz, R. Elmasri, B. Czejdo, D. Georgakopoulos, G. Karabatis, A. Jamoussi, K. Loa, and Y. Li, “Omnibase: Design and implementation of a multidatabase system,”IEEE Distributed Processing Technical Committee Newsletter, 10(2):20–28, 1988.

    Google Scholar 

  14. M. Templeton, D. Brill, S.K. Dao, E. Lund, P. Ward, A.L.P. Chen, and R. MacGregor, “Mermaid—a front-end to distributed heterogeneous databases,”Proceedings of the IEEE, 75(5):695–708, May 1987.

    Google Scholar 

  15. Y.R. Wang and S.E. Madnick, “A polygen model for heterogeneous database systems: The source tagging perspective,”Proc. of the 16th VLDB Conf., pp. 519–538, 1990.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Recommended by: Clement Yu

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, EP., Srivastava, J. & Hwang, SY. An algebraic transformation framework for multidatabase queries. Distrib Parallel Databases 3, 273–307 (1995). https://doi.org/10.1007/BF01418060

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01418060

Keywords

Navigation