Skip to main content
Log in

A query processing algorithm for a system of heterogeneous distributed databases

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases.

Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.M.G.Apers, A.R.Hevner, and S.B.Yao, “Optimization Algorithms for Distributed Queries”, IEEE Transactions on Software Engineering, vol. SE-9, no. 1, pp. 57–68, January 1983.

    Google Scholar 

  2. S. Abiteboul and N. Bidoit, “Non First Normal Form Relations to Represent Hierarchically Organized Data”, in Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, April 1984, ACM, pp. 191–200, New York, 1984.

  3. E.Balas, “An Additive Algorithm for Solving Linear Programs with Zero-one Variables”, Operations Research, vol. 13, pp. 517–545, 1965.

    Google Scholar 

  4. P.A.Bernstein, N.Goodman, E.Wong, C.L.Reeves, and J.B.Rothnie, “Query Processing in a System for Distributed Databases (SDD-1)”, ACM Transactions on Database Systems, vol. 6, no. 4, pp. 602–625, 1981.

    Google Scholar 

  5. B. Bhasker, “On Cluster Algebra for Heterogeneous Databases”, Technical Report #21-89, VPI & SU, Department of Computer Science, 1989.

  6. B. Bhasker, “Query Processing in Heterogeneous Distributed Database Management Systems”, Ph.D. Dissertation, VPI & SU, Department of Computer Science, 1992.

  7. B. Bhasker, S. Huang, and B. Jacobs, “DAVID: NASA's Heterogeneous Distributed Database Management System”, in Proceedings of 1990 Symposium of Applied Computing, April 1990.

  8. B. Bhasker, C. Egyhazy, and K. Triantis, “The Architecture of a Heterogeneous Distributed Database Management System: The DAVID”, in Proceedings of 1992 ACM Computer Science Conference, March 1992.

  9. B. Bhasker, C. Egyhazy, and K. Triantis, “Non-parametric Estimation Technique in Support of Query Decomposition Strategies in HDDBMSs”, in Proceedings of 30th Southeast ACM Computer Science Conference, April 1992.

  10. S.Ceri and G.Gottlob, “Optimizing Joins between Two Partitioned Relations in Distributed Databases”, Journal of Parallel and Distributed Computing, vol. 3, pp. 183–205, 1986.

    Google Scholar 

  11. C. Chao and C.J. Egyhazy, “Estimating Temporary File Sizes in Distributed Database Systems”, in Proceedings of The International Conference on Data Engineering, IEEE, Los Angeles, CA, Feb. 1986.

    Google Scholar 

  12. A. Gupta (Ed.), Integrating of Information Systems: Bridging Heterogeneous Databases, IEEE Press, 1989.

  13. S. Huang and I. Lee, “Database Interfaces in Heterogeneous Database Management Systems”, in Proceedings of 1988 ICAST, Chicago, Illinois, March 1988, pp. 80–89.

  14. R.Hull and C.Yap, “The Format Model: A Theory of Database Organization”, Journal of ACM, vol. 31, no. 3, pp. 518–537, July 1984.

    Google Scholar 

  15. B.Jacobs, “On Database Logic”, Journal of ACM, vol. 29, no. 2, pp. 310–332, 1982.

    Google Scholar 

  16. B. Jacobs, Applied Database Logic, vol. I: Fundamental Database Issues, Prentice-Hall, 1985.

  17. L.G.Khachian, “A Polynomial Algorithm in Linear Programming”, Z. Vycisl, Math., Fiz 20, pp. 517–545, 1965.

    Google Scholar 

  18. G. Kuper and M. Vardi, “A New Approach to Database Logic”, in Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, April 1984, ACM, pp. 86–96, New York, 1984.

  19. A. Makinouchi, “A Consideration on Normal Form of Not-Necessarily-Normalized Relation in the Relational Data Model,” in Proceedings of The Third International Conference on Very Large Databases, Tokyo, Oct., 1977, pp. 447–453.

  20. M. Mannimo, P. Chu, and T. Sager, “Statistical Profile Estimation in Database Systems”, in ACM Computing Surveys, vol. 20, no. 3, pp. 191–221, Sep. 1988.

  21. M.A.Roth, H.F.Korth, and A.Siberschatz, “Extended Algebra and Calculus for Nested Relational Databases”, ACM Transactions on Database Systems, vol. 13, no. 4, pp. 389–417, 1988.

    Google Scholar 

  22. L. Scrage, Linear Programming Models with LINDO, Scientific Press, 1981.

  23. A Sheth and J. Larson, “Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases”, ACM Computing Surveys, vol. 22, no. 3, Sept. 1990.

  24. J.Smith, “Optimizing the Performance of a Relational Algebra Database Interface”, Communications of ACM, vol. 18, no. 10, pp. 568–579, 1975.

    Google Scholar 

  25. K.Triantis and C.J.Egyhazy, “An Integer Programming Formulation Embedded in an Algorithm for Query Processing Optimization in Distributed Relational Database Systems,” Computers and Operations Research, vol. 15, no. 1, pp. 51–60, 1988.

    Google Scholar 

  26. K. Triantis and C.J. Egyhazy, “A Framework for Study of Query Decomposition for HDDBMSs,” Technical Report, VPI and SU, Dept. of Computer Science, 1987.

  27. N. Wakim, “On View Integration in Heterogeneous DBMS”, unpublished Ph.D. Dissertation, Polytechnic University, New York, Dept. of Computer Science, 1988.

  28. J. Weiss, “Astrophysics Data System”, in Proceedings of First Meeting of the Science Operations MOWG, Washington D.C., 1988.

  29. J. Welch, “Query Optimization in a Heterogeneous Distributed DBMS”, unpublished Thesis, University of Maryland, College Park, Dept. of Computer Science, 1986.

  30. C.T.Yu and C.C.Chang, “Distributed Query Processing”, ACM Computing Surveys, vol. 16, no. 4, pp. 399–433, 1984.

    Google Scholar 

  31. C.T. Yu, C.C. Wang, Templeton, Brill, and Lund, “Query Processing in a Fragmented Relational Distributed System: Mermaid”, IEEE Transactions on Software Engineering, 1985.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Recommended by: Y. Breitbart

Rights and permissions

Reprints and permissions

About this article

Cite this article

Egyhazy, C.J., Triantis, K.P. & Bhasker, B. A query processing algorithm for a system of heterogeneous distributed databases. Distrib Parallel Databases 4, 49–79 (1996). https://doi.org/10.1007/BF00122148

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00122148

Keywords

Navigation