Skip to main content
Log in

Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Writing parallel programs that can take advantage of non-dedicated processors is much more difficult than writing such programs for networks of dedicated processors. In a non-dedicated environment such programs must use autonomic techniques to respond to the unpredictable load fluctuations that prevail in the computational environment. In adaptive query processing (AQP), several techniques have been proposed for dynamically redistributing processor load assignments throughout a computation to take account of varying resource capabilities, but we know of no previous study that compares their performance. This paper presents a simulation-based evaluation of these autonomic parallelization techniques in a uniform environment and compares how well they improve the performance of the computation. Four published strategies are compared with a new algorithm that seeks to overcome some weaknesses identified in the existing approaches. In addition, we explore the use of techniques from online algorithms to provide a firm foundation for determining when to adapt in two of the existing algorithms. The evaluations identify situations in which each strategy may be used effectively and in which it should be avoided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Proc. 1st ICSOC, pp. 467–482. Springer, Heidelberg (2003)

  2. Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: ACM SIGMOD, pp. 261–272 (2000)

  3. Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: Proc. ACM SIGMOD, pp. 107–118 (2005)

  4. Braumandl, R., Keidl, M., Kemper, A., Kossmann, D., Kreutz, A., Seltzsam, S., Stocker, K.: ObjectGlobe: ubiquitous query processing on the internet. VLDB J. 10(1), 48–71 (2001)

    MATH  Google Scholar 

  5. Chaudhuri, S., Narasayya, V.R., Ramamurthy, R.: Estimating progress of long running SQL queries. In: Proc. SIGMOD, pp. 803–814 (2004)

  6. Davidson, S.B., Crabtree, J., Brunk, B.P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J.: K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Syst. J. 40(2), 512–531 (2001)

    Article  Google Scholar 

  7. DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: Proc. VLDB, pp. 27–40 (1992)

  8. DeWitt, D.J.: Parallel database systems: the future of high performance database systems. Comm. ACM. 35(6), 85–98 (1992)

    Article  Google Scholar 

  9. Eggers, S.J., Katz, R.H.: Evaluating the performance of four cache coherency protocols. In: Proceedings of the 16th International Symposium on Computer Architecture, pp. 2–15 (1989)

  10. Ewen, S., Kache, H., Markl, V., Raman, V.: Progressive query optimization for federated queries. In: Proc. 10th EDBT, pp. 847–864. Springer, Heidelberg (2006)

  11. Fiat, A. et al.: Competitive paging algorithms. J. Algorithms 12, 685–699 (1991)

    Article  MATH  Google Scholar 

  12. Gounaris, A., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Self-monitoring query execution for adaptive query processing. Data Knowl. Eng. 51(3), 325–348 (2004)

    Article  Google Scholar 

  13. Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A.A.: Adapting to changing resources in grid query processing. In: Proc. 1st International Workshop on Data Management in Grids, pp. 30–44. Springer, Heidelberg (2005)

  14. Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: Proc. SIGMOD, pp. 102–111 (1990)

  15. Graefe, G.: Iterators, schedulers, and distributed memory parallelism. Softw. Pract. Exp. 26(4), 427–452 (1996)

    Article  Google Scholar 

  16. Huebsch, R., Hellerstein, J.M., Lanham, N., Thau Loo, B., Shenker, S., Stoica, I.: Querying the internet with pier. In: VLDB, pp. 321–332 (2003)

  17. Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in data integration queries. In: Proc. SIGMOD, pp. 395–406 (2004)

  18. Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: Proc. SIGMOD, pp. 524–532 (2002)

  19. Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)

    Article  Google Scholar 

  20. Lilja, D.J.: Measuring Computer Performance. Cambridge University Press, London (2000)

    Google Scholar 

  21. Liu, D.T., Franklin, M.J.: GridDB: a data-centric overlay for scientific grids. In: Proc. VLDB, pp. 600–611. Morgan-Kaufmann (2004)

  22. Luo, G., Naughton, J.F., Ellmann, C., Watzke, M.: Toward a progress indicator for database queries. In: Proc. ACM SIGMOD, pp. 791–802 (2004)

  23. Manasse, M.S., McGeoch, L.A., Sleator, D.D.: Competitive algorithms for on-line problems. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 322–333 (1988)

  24. Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proc. ACM SIGMOD, pp. 659–670 (2004)

  25. Narayanan, S., Kurc, T.M., Saltz, J.: Database support for data-driven scientific applications in the grid. Parallel Process. Lett. 13(2), 245–271 (2003)

    Article  MathSciNet  Google Scholar 

  26. Paton, N.W., Raman, V., Swart, G., Narang, I.: Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options. In: Proc. 3rd Intl. Conference on Autonomic Computing, pp. 221–230. IEEE Press (2006)

  27. Rahm, E., Marek, R.: Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In: Proc. VLDB, pp. 182–193 (1993)

  28. Raman, V., Han, W., Narang, I.: Parallel querying with non- dedicated computers. In: Proc. VLDB, pp. 61–72 (2005)

  29. Romer, T.H., Ohlrich, W.H., Karlin, A.R., Bershad, B.N.: Reducing TLB and memory overhead using online superpage promotion. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 176–187 (1995)

  30. Sampaio, S., Paton, N.W., Smith, J., Watson, P.: Measuring and modelling the performance of a parallel ODMG compliant object database server. Concurr. Pract. Exp. 18(1), 63–109 (2006)

    Article  Google Scholar 

  31. Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly available fault-tolerant, parallel dataflows. In: Proc. SIGMOD, pp. 827–838 (2004)

  32. Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., Franklin, M.J.: Flux: an adaptive partitioning operator for continuous query systems. In: Proc. ICDE, pp. 353–364. IEEE Press (2003)

  33. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. Intl. J. High Perform. Comput. Appl. 17(4), 353–368 (2003)

    Article  Google Scholar 

  34. Smith, J., Watson, P.: Fault-tolerance in distributed query processing. In: Proc. IDEAS, pp. 329–338. IEEE Press (2005)

  35. Swart, G.: Spreading the load using consistent hashing: a preliminary report. In: 3rd Int. Symp. on Parallel and Distributed Computing, pp. 169–176. IEEE Press (2004)

  36. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proc VLDB, pp. 333–344 (2004)

  37. Urhan, T., Franklin, M.J.: XJoin: a reactively-scheduled pipelined join operator. Data Eng. Bull. 23(2), 27–33 (2000)

    Google Scholar 

  38. Yellin, D.M.: Competitive algorithms for the dynamic selection of component implementations. IBM Syst. J. 42(1), 85–97 (2003)

    MathSciNet  Google Scholar 

  39. Zhou, Y., Ooi, B.C., Tan, K.-L., Tok, W.H.: an adaptable distributed query processing architecture. Data Knowl. Eng. 53(3), 283–309 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norman W. Paton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paton, N.W., Buenabad-Chavez, J., Chen, M. et al. Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options. The VLDB Journal 18, 119–140 (2009). https://doi.org/10.1007/s00778-007-0090-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0090-x

Keywords

Navigation