A query processing algorithm for a system of heterogeneous distributed databases

Egyhazy, Csaba J.; Triantis, Konstantinos P.; Bhasker, Bharat

doi:10.1007/BF00122148

A query processing algorithm for a system of heterogeneous distributed databases

Published: January 1996

Volume 4, pages 49–79, (1996)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Csaba J. Egyhazy¹,
Konstantinos P. Triantis² &
Bharat Bhasker³

82 Accesses
9 Citations
Explore all metrics

Abstract

This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases.

Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed secondo: an extensible and scalable database management system

Article 23 June 2017

On Transformation of Query Scheduling Strategies in Distributed and Heterogeneous Database Systems

Challenges of Modern Query Processing

References

P.M.G.Apers, A.R.Hevner, and S.B.Yao, “Optimization Algorithms for Distributed Queries”, IEEE Transactions on Software Engineering, vol. SE-9, no. 1, pp. 57–68, January 1983.
Google Scholar
S. Abiteboul and N. Bidoit, “Non First Normal Form Relations to Represent Hierarchically Organized Data”, in Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, April 1984, ACM, pp. 191–200, New York, 1984.
E.Balas, “An Additive Algorithm for Solving Linear Programs with Zero-one Variables”, Operations Research, vol. 13, pp. 517–545, 1965.
Google Scholar
P.A.Bernstein, N.Goodman, E.Wong, C.L.Reeves, and J.B.Rothnie, “Query Processing in a System for Distributed Databases (SDD-1)”, ACM Transactions on Database Systems, vol. 6, no. 4, pp. 602–625, 1981.
Google Scholar
B. Bhasker, “On Cluster Algebra for Heterogeneous Databases”, Technical Report #21-89, VPI & SU, Department of Computer Science, 1989.
B. Bhasker, “Query Processing in Heterogeneous Distributed Database Management Systems”, Ph.D. Dissertation, VPI & SU, Department of Computer Science, 1992.
B. Bhasker, S. Huang, and B. Jacobs, “DAVID: NASA's Heterogeneous Distributed Database Management System”, in Proceedings of 1990 Symposium of Applied Computing, April 1990.
B. Bhasker, C. Egyhazy, and K. Triantis, “The Architecture of a Heterogeneous Distributed Database Management System: The DAVID”, in Proceedings of 1992 ACM Computer Science Conference, March 1992.
B. Bhasker, C. Egyhazy, and K. Triantis, “Non-parametric Estimation Technique in Support of Query Decomposition Strategies in HDDBMSs”, in Proceedings of 30th Southeast ACM Computer Science Conference, April 1992.
S.Ceri and G.Gottlob, “Optimizing Joins between Two Partitioned Relations in Distributed Databases”, Journal of Parallel and Distributed Computing, vol. 3, pp. 183–205, 1986.
Google Scholar
C. Chao and C.J. Egyhazy, “Estimating Temporary File Sizes in Distributed Database Systems”, in Proceedings of The International Conference on Data Engineering, IEEE, Los Angeles, CA, Feb. 1986.
Google Scholar
A. Gupta (Ed.), Integrating of Information Systems: Bridging Heterogeneous Databases, IEEE Press, 1989.
S. Huang and I. Lee, “Database Interfaces in Heterogeneous Database Management Systems”, in Proceedings of 1988 ICAST, Chicago, Illinois, March 1988, pp. 80–89.
R.Hull and C.Yap, “The Format Model: A Theory of Database Organization”, Journal of ACM, vol. 31, no. 3, pp. 518–537, July 1984.
Google Scholar
B.Jacobs, “On Database Logic”, Journal of ACM, vol. 29, no. 2, pp. 310–332, 1982.
Google Scholar
B. Jacobs, Applied Database Logic, vol. I: Fundamental Database Issues, Prentice-Hall, 1985.
L.G.Khachian, “A Polynomial Algorithm in Linear Programming”, Z. Vycisl, Math., Fiz 20, pp. 517–545, 1965.
Google Scholar
G. Kuper and M. Vardi, “A New Approach to Database Logic”, in Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, April 1984, ACM, pp. 86–96, New York, 1984.
A. Makinouchi, “A Consideration on Normal Form of Not-Necessarily-Normalized Relation in the Relational Data Model,” in Proceedings of The Third International Conference on Very Large Databases, Tokyo, Oct., 1977, pp. 447–453.
M. Mannimo, P. Chu, and T. Sager, “Statistical Profile Estimation in Database Systems”, in ACM Computing Surveys, vol. 20, no. 3, pp. 191–221, Sep. 1988.
M.A.Roth, H.F.Korth, and A.Siberschatz, “Extended Algebra and Calculus for Nested Relational Databases”, ACM Transactions on Database Systems, vol. 13, no. 4, pp. 389–417, 1988.
Google Scholar
L. Scrage, Linear Programming Models with LINDO, Scientific Press, 1981.
A Sheth and J. Larson, “Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases”, ACM Computing Surveys, vol. 22, no. 3, Sept. 1990.
J.Smith, “Optimizing the Performance of a Relational Algebra Database Interface”, Communications of ACM, vol. 18, no. 10, pp. 568–579, 1975.
Google Scholar
K.Triantis and C.J.Egyhazy, “An Integer Programming Formulation Embedded in an Algorithm for Query Processing Optimization in Distributed Relational Database Systems,” Computers and Operations Research, vol. 15, no. 1, pp. 51–60, 1988.
Google Scholar
K. Triantis and C.J. Egyhazy, “A Framework for Study of Query Decomposition for HDDBMSs,” Technical Report, VPI and SU, Dept. of Computer Science, 1987.
N. Wakim, “On View Integration in Heterogeneous DBMS”, unpublished Ph.D. Dissertation, Polytechnic University, New York, Dept. of Computer Science, 1988.
J. Weiss, “Astrophysics Data System”, in Proceedings of First Meeting of the Science Operations MOWG, Washington D.C., 1988.
J. Welch, “Query Optimization in a Heterogeneous Distributed DBMS”, unpublished Thesis, University of Maryland, College Park, Dept. of Computer Science, 1986.
C.T.Yu and C.C.Chang, “Distributed Query Processing”, ACM Computing Surveys, vol. 16, no. 4, pp. 399–433, 1984.
Google Scholar
C.T. Yu, C.C. Wang, Templeton, Brill, and Lund, “Query Processing in a Fragmented Relational Distributed System: Mermaid”, IEEE Transactions on Software Engineering, 1985.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Polytechnic Institute and State University, Northern Virginia Graduate Center, 2990 Telestar Court, 22042, Falls Church, VA
Csaba J. Egyhazy
Department of Industrial & Systems Engineering, Virginia Polytechnic Institute and State University, Northern Virginia Graduate Center, 2990 Telestar Court, 22042, Falls Church, VA
Konstantinos P. Triantis
MDL Information Systems Inc., 14600 Catalina St, 94577, San Leandro, CA
Bharat Bhasker

Authors

Csaba J. Egyhazy
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos P. Triantis
View author publications
You can also search for this author in PubMed Google Scholar
Bharat Bhasker
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Recommended by: Y. Breitbart

Rights and permissions

Reprints and permissions

About this article

Cite this article

Egyhazy, C.J., Triantis, K.P. & Bhasker, B. A query processing algorithm for a system of heterogeneous distributed databases. Distrib Parallel Databases 4, 49–79 (1996). https://doi.org/10.1007/BF00122148

Download citation

Received: 16 March 1992
Accepted: 13 March 1995
Issue Date: January 1996
DOI: https://doi.org/10.1007/BF00122148

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A query processing algorithm for a system of heterogeneous distributed databases

Abstract

Access this article

Similar content being viewed by others

Distributed secondo: an extensible and scalable database management system

On Transformation of Query Scheduling Strategies in Distributed and Heterogeneous Database Systems

Challenges of Modern Query Processing

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A query processing algorithm for a system of heterogeneous distributed databases

Abstract

Access this article

Similar content being viewed by others

Distributed secondo: an extensible and scalable database management system

On Transformation of Query Scheduling Strategies in Distributed and Heterogeneous Database Systems

Challenges of Modern Query Processing

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation