ABSTRACT
The disparate and geographically distributed data sources in an enterprise can be integrated using distributed computing technologies such as data grids. The real challenge involved in such data integration efforts is in the design and development of the distributed query processing engine that lie beneath such integrated systems. In the current literature, distributed query processing and optimization is carried out in three distinct phases namely, (1) creation of single node plan, (2) generation of parallel plan, and (3) optimal site selection for plan execution. As considering the three phases in isolation leads to sub-optimal plans, the paper proposes a new distributed query optimization model that integrates all the three phases of the query optimization. This paper also presents different heuristic approaches for solving the proposed integrated distributed query processing problem. Furthermore, the presented system is integrated with a data grid solution and several real-time experiments are conducted to demonstrate its usefulness.
- Apers, P., Hevner, A., and Yao, A., "Optimization algorithms for distributed queries", IEEE Transactions on Software Engineering, 9 (1), pp. 57--68, 1983 Google ScholarDigital Library
- Bagul, S. S., Ranade, N., Sharma, A. et al (2006) A Grid based Approach for Dynamic Integration and Access of Distributed and Heterogeneous Information across an Enterprise, International Conference on Information Resources Management Association, (IRMA), 2006Google Scholar
- Bernstein, P., Goodman, N., Wong, E., Reeve, C., and Rothnie, J., "Query processing in a system for distributed databases (SDD-1)", ACM Transactions on Database Systems, 6 (4), pp. 602--625, 1981 Google ScholarDigital Library
- Buyya, R., and Venugopal, S., "A gentle introduction to grid computing and technologies", Computer Society of India Communications, July 2005Google Scholar
- Fegaras, L., "A new heuristic for optimizing large queries", Proc. of DEXA 98, pp. 726--735, 1998 Google ScholarDigital Library
- Gounaris, A., Sakellariou, R., Paton, N. W., and Fernandes, A. A. A., "A novel approach to resource scheduling for parallel query processing on computational grids", Distributed parallel databases, 19, pp. 87--106, 2006 Google ScholarDigital Library
- Graefe, G., "Query evaluation techniques for large databases", ACM Computing Surveys, 25 (2), 1993. Google ScholarDigital Library
- Graefe, G., "Encapsulation of parallelism in the volcano query processing system", Proc. of the ACM SIGMOD Conf. on Management of Data, Atlantic City, NJ, USA, pp. 102--111, 1990 Google ScholarDigital Library
- Haraty, R. A., and Fany, R. C., "Query acceleration in distributed database systems", Revista Comlombiana de Computación, 2 (1), pp. 19--34, 2001Google Scholar
- Ioannidis, Y., "Query optimization", ACM Computing Surveys, 28 (1), pp. 121--123, 1996 Google ScholarDigital Library
- Kossmann, D., and Stocker, K., "Iterative dynamic programming: A new class of query optimization algorithms", ACM Transactions on Database Systems, 25 (1), pp. 43--82, 2000 Google ScholarDigital Library
- Krishnamoorthy, S., (2007) Integrated Distributed Query Processor for the Data Grids, IADIS International Conference on WWW/Internet 2007, Oct 5--8, Vila Real, Portugal (accepted for publication)Google Scholar
- Labrinidis, A., Roussopoulos, N., "Exploring the tradeoffbetween performance and data freshness in database-driven web servers", The VLDB Journal, 13 (3), pp. 204--255, 2004 Google ScholarDigital Library
- Liu, C., Chen, H., "A hash partitioning strategy for distributed query processing", Proc. of 5th Intl. Conf. on Extending Database Technology: Advances in Database Technology, LNCS Vol. 1057, pp. 373--387, 1996 Google ScholarDigital Library
- Liu, L., Pu, C., and Richine, K., "Distributed query scheduling service: An architecture and its implementation", Intl. Journal of Cooperative Information Systems, 7 (2 & 3), 1998Google Scholar
- Lin, S., "Computer solutions of the traveling salesman problem", Bell System Technical Journal, 44, pp. 2245--2269, 1965Google ScholarCross Ref
- OGSA-DAI Project, http://www.ogsadai.org.uk/, last accessed 05-Dec-2006Google Scholar
- Or, I,., "Traveling Salesman-Type Combinatorial Problems and their Relation to the Logistics of Regional Blood Banking", Ph.D. thesis, Northwestern University, Evanston, Illinois, 1976Google Scholar
- Selinger, P. G., Astrahan, M. M.,. Chamberlin, D. D., Lorie, R. A., and Price, T. G., "Access path selection in a relational database management system", Proc. of the 1979 ACM SGMOD Intl. Conf. on the Management of Data, 1979 Google ScholarDigital Library
- Yu, C. T., Chang, C., and Chang, Y., "Two surprising results in processing simple queries in distributed databases", Proc. of 6th IEEE Intl. Computer Software and Applications Conference, pp. 377--384, 1982Google Scholar
- Yu, C. T., and Chang, C. C., "Distributed query processing", ACM Computing Surveys, 16 (4), pp. 399--43, 1984 Google ScholarDigital Library
Index Terms
- An integrated query optimization system for data grids
Recommendations
Distributed multi-join query processing in data grids
Query processing in data grids is a difficult issue due to the heterogeneous, unpredictable and volatile behaviors of the grid resources. Applying join operations on remote relations in data grids is a unique and interesting problem. However, to the ...
Grid query optimizer to improve query processing in grids
The emergence of computational grids, as global computing infrastructures, calls for development of new and advanced database techniques. While there exist algorithms and tools that facilitate database operations in grids, currently query optimization ...
Distributed Query Plan Generation using Ant Colony Optimization
Query processing is a critical performance evaluation parameter and has received a considerable amount of attention especially in the context of distributed database systems. The aim of distributed query processing is to effectively and efficiently ...
Comments