Skip to main content
Log in

A novel approach to resource scheduling for parallel query processing on computational grids

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Advances in network technologies and the emergence of Grid computing have both increased the need and provided the infrastructure for computation and data intensive applications to run over collections of heterogeneous and autonomous nodes. In the context of database query processing, existing parallelisation techniques cannot operate well in Grid environments because the way they select machines and allocate tasks compromises partitioned parallelism. The main contribution of this paper is the proposal of a low-complexity, practical resource selection and scheduling algorithm that enables queries to employ partitioned parallelism, in order to achieve better performance in a Grid setting. The evaluation results show that the scheduler proposed outperforms current techniques without sacrificing the efficiency of resource utilisation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Alpdemir, A. Gounaris, A. Mukherjee, D. Fitzgerald, N.W. Paton, P. Watson, R. Sakellariou, A.A. Fernandes, and J. Smith, “Experience on performance evaluation with OGSA-DQP,” in Proc. of Fourth UK e-Science All Hands Meeting, 2005.

  2. N. Alpdemir, A. Mukherjee, N.W. Paton, P. Watson, A.A.A. Fernandes, A. Gounaris, and J. Smith, “Service-based distributed querying on the grid,” in Proc. of ICSOC, 2003, pp. 467–482.

  3. P. Bernstein, N. Goodman, E. Wong, C. Reeve, and J. Rothnie Jr., “Query processing in a system for distributed databases (SDD-1),” ACM TODS, vol. 6, no. 4, pp. 602–625, 1981.

    Article  Google Scholar 

  4. V. Boudet, F. Desprez, and F. Suter, “One-step algorithm for mixed data and task parallel scheduling without data replication,” in 17th International Parallel and Distributed Processing Symposium (IPDPS-2003). Los Alamitos, CA, IEEE Computer Society, 2003.

  5. K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, “Grid information services for distributed resource sharing,” in 10th IEEE Symp. on High Performance Distributed Computing, 2001.

  6. H. Dail, F. Berman, and H. Casanova, “A decoupled scheduling approach for Grid application development environments,” Journal of Parallel and Distributed Computing, vol. 63, no. 5, pp. 505–524, 2003.

    Article  Google Scholar 

  7. H. Dail, O. Sievert, F. Berman, H. Casanova, A. YarKhan, S. Vadhiyar, J. Dongarra, C. Liu, L. Yang, D. Angulo, and I. Foster, “Scheduling in the grid application development software project,” in J. Nabrzyski, J. Schopf, and J. Weglarz (eds.), Grid Resource Management: State of the Art and Future Trends. Kluwer Academic Publishers Group, 2003.

  8. A. Deshpande and J.M. Hellerstein, “Decoupled query optimization for federated database systems,” in Proc. of ICDE 2002, pp. 716–732.

  9. D.J. DeWitt, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna, “GAMMA—A high performance dataflow database machine,” in Proc. of the 12th VLDB Conf., 1986, pp. 228–237.

  10. R. Epstein, M. Stonebraker, and E. Wong, “Distributed query processing in a relational data base system,” in Proc. of the 1978 ACM SIGMOD Conf., 1978, pp. 169–180.

  11. M. Garofalakis and Y. Ioannidis, “Parallel query scheduling and optimization with time- and space-shared resources,” in Proc. of VLDB, 1997, pp. 296–305.

  12. S. Goel, H. Sharda, and D. Taniar, “Atomic commitment and resilience in grid database systems,” International Journal of Grid and Utility Computing, vol. 1, no. 1, pp. 46–60, 2005.

    Article  Google Scholar 

  13. A. Gounaris, R. Sakellariou, N.W. Paton, and A.A.A. Fernandes, “Resource scheduling for parallel query processing on computational grids,” in Proc. of 5th IEEE/ACM International Workshop on Grid Computing GRID, 2004, pp. 396–401.

  14. G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Surveys, vol. 25, no. 2, pp. 73–170, 1993.

    Article  Google Scholar 

  15. Y. Ioannidis, “Query optimization,” ACM Computing Surveys, vol. 28, no. 1, 1996.

  16. D. Kossmann, “The State of the art in distributed query processing,” ACM Computing Surveys, vol. 32, no. 4, pp. 422–469, 2000.

    Article  Google Scholar 

  17. Y.-K. Kwok and I. Ahmad, “Static scheduling algorithms for allocating directed task graphs to multiprocessors,” ACM Comput. Surveys, vol. 31, no. 4, pp. 406–471, 1999.

    Article  Google Scholar 

  18. D.T. Liu, M. Franklin, and D. Parekh, “GridDB: A relational interface for the grid,” in ACM SIGMOD, ACM Press, 2003, pp. 660–660.

  19. L.F. Mackert and G.M. Lohman, “R* optimizer validation and performance evaluation for distributed queries,” in Proc. of the 12th VLDB Conf., 1986, pp. 149–159.

  20. T. Mayr, P. Bonnet, J. Gehrke, and P. Seshadri, “Leveraging non-uniform resources for parallel query processing,” in 3rd IEEE CCGrid, 2003.

  21. S. Narayanan, U. Catalyurek, T. Kurc, X. Zhang, and J. Saltz, “Applying database support for large scale data driven science in distributed environemnts,” in Proc. of GRID, 2003.

  22. A. Petitet, S. Blackford, J. Dongarra, B. Ellis, G. Fagg, K. Roche, and S. Vadhiyar, “Numerical libraries and the grid,” International Journal of High Performance Computing Applications, vol. 15, no. 4, pp. 359–374, 2001.

    Article  Google Scholar 

  23. A. Radulescu, C. Nicolescu, A. van Gemund, and P. Jonker, “CPR: Mixed task and data parallel scheduling for distributed systems,” in Proc. of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), IEEE Computer Society, 2001.

  24. A. Radulescu and A. van Gemund, “A low-cost approach towards mixed task and data parallel scheduling,” in Proc. of 2001 International Conference on Parallel Processing (30th ICPP’01), Valencia, Spain, 2001.

  25. E. Rahm and R. Marek, “Dynamic multi-resource load balancing in parallel database systems,” in 21th VLDB, Conf., 1995, pp. 395–406.

  26. M. Roth, F. Ozcan, and L. Haas, “Cost models DO matter: Providing cost information for diverse data sources in a federated system,” in The VLDB Journal, 1999, pp. 599–610.

  27. R. Sakellariou and H. Zhao, “A hybrid heuristic for DAG scheduling on heterogeneous systems,” in Proc. of 13th HCW Workshop, IEEE Computer Society, 2004.

  28. S. Sampaio, N.W. Paton, J. Smith, and P. Watson, “Validated cost models for parallel OQL query processing,” in Proc. of OOIS, 2002, pp. 60–75.

  29. S. Shivle, R. Castain, H.J. Siegel, A.A. Maciejewski, T. Banka, K. Chindam, S. Dussinger, P. Pichumani, P. Satyasekaran, W. Saylor, D. Sendek, J. Sousa, J. Sridharan, P. Sugavanam, and J. Velazco, “Static mapping of subtasks in a heterogeneous ad hoc grid environment,” in Proc. of 13th HCW Workshop, IEEE Computer Society, 2004.

  30. J. Smith, A. Gounaris, P. Watson, N.W. Paton, A.A.A. Fernandes, and R. Sakellariou, “Distributed query processing on the grid,” International Journal of High Performance Computing Applications, vol. 17, no. 4, pp. 353–367, 2003.

    Article  Google Scholar 

  31. E. Stolte and G. Alonso, “Optimizing scientific databases for client side data processing,” in Proc. of EDBT, 2002, pp. 390–408.

  32. M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu, “Mariposa: A wide-area distributed database system,” VLDB Journal, vol. 5, no. 1, pp. 48–63, 1996.

    Article  Google Scholar 

  33. T. Tannenbaum, D. Wright, K. Miller, and M. Livny, “Condor—A distributed job scheduler,” in T. Sterling (ed.), Beowulf Cluster Computing with Linux, MIT Press, 2002.

  34. D. Thain, T. Tannenbaum, and M. Livny, “Condor and the grid,” in F. Berman, G. Fox, and T. Hey (eds.), Grid Computing: Making the Global Infrastructure a Reality, John Wiley & Sons Inc., 2003.

  35. G. Thomas, G. Thompson, C. Chung, E. Barkmeyer, F. Carter, M. Templeton, S. Fox, and B. Hartman, “Heterogeneous distributed database systems for production use,” ACM Computing Surveys, vol. 22, no. 3, pp. 237–266, 1990.

    Article  Google Scholar 

  36. A.N. Wilschut, J. Flokstra, and P. Apers, “Parallelism in a main-memory DBMS: The performance of PRISMA/DB,” in Proceedings of the 18th VLDB Conf., 1992.

  37. R. Wolski, N.T. Spring, and J. Hayes, “The network weather service: a distributed resource performance forecasting service for metacomputing,” Future Generation Computer Systems, vol. 15, nos. 5–6, pp. 757–768, 1999.

    Article  Google Scholar 

  38. A. YarKhan and J. Dongarra, “Experiments with scheduling using simulated annealing in a Grid environment,” in Proc. of 3rd International Workshop on Grid Computing GRID, 2002, pp. 232–242.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Gounaris.

Additional information

Recommended by: Ioannis Vlahavas

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gounaris, A., Sakellariou, R., Paton, N.W. et al. A novel approach to resource scheduling for parallel query processing on computational grids. Distrib Parallel Databases 19, 87–106 (2006). https://doi.org/10.1007/s10619-006-8490-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-8490-2

Keywords

Navigation