Skip to main content
Log in

Adaptive workload allocation in query processing in autonomous heterogeneous environments

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The increasing prevalence of networked storage and computational resources, along with middleware for managing resource access and sharing, raises the prospect that queries can be run over resources obtained on demand, rather than on dedicated infrastructures. However, the movement of query processing into non-dedicated environments means that it is necessary to take account of the partial information and unstable conditions that characterise autonomous, shared, distributed settings. Thus, query processing on grid platforms needs to be adaptive, revising evaluation strategies at query runtime in response to the evolving environment, such as changes to machine load and availability. To address this challenge, adaptive techniques are described that: (i) balance load across plan partitions supporting intra-operator parallelism; (ii) remove bottlenecks in pipelined plans supporting inter-operator parallelism; and (iii) combine the two aforementioned techniques. The approach has been empirically evaluated in a grid-enabled adaptive query processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Proc. 1st ICSOC, pp. 467–482. Springer, Berlin (2003)

    Google Scholar 

  2. Antonioletti, M., Atkinson, M., Baxter, R., Borley, A., Chue Hong, N.P., Collins, B., Hardman, N., Hulme, A.C., Knox, A., Jackson, M., Krause, A., Laws, S., Magowan, J., Paton, N.W., Pearson, D., Sugden, T., Watson, P., Westhead, M.: The design and implementation of grid database services in OGSA-DAI. Concurr. Pract. Exper. 17, 357–376 (2005)

    Article  Google Scholar 

  3. Arpaci-Dusseau, R., Anderson, E., Treuhaft, N., Culler, D., Hellerstein, J., Patterson, D., Yelick, K.: Cluster I/O with river: making the fast case common. In: Proc. of the Sixth IOPADS Workshop, pp. 10–22 (1999)

  4. Avnur, R., Hellerstein, J.: Eddies: continuously adaptive query processing. In: Proc. of ACM SIGMOD 2000, pp. 261–272 (2000)

  5. Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: Proc. ACM SIGMOD, pp. 107–118 (2005)

  6. Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR, pp. 238–249 (2005)

  7. Braumandl, R., Keidl, M., Kemper, A., Kossmann, K., Kreutz, A., Seltzsam, S., Stocker, K.: ObjectGlobe: ubiquitous query processing on the Internet. VLDB J. 10(1), 48–71 (2001)

    MATH  Google Scholar 

  8. Chandrasekaran, S., Franklin, M.: PSoup: a system for streaming queries over streaming data. VLDB J. 12, 140–156 (2003)

    Article  Google Scholar 

  9. Chaudhuri, S., Narasayya, V., Ramamurthy, R.: Estimating progress of execution for sql queries. In: Proc. of ACM SIGMOD, pp. 803–814 (2004)

  10. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: CIDR (2003)

  11. Yang, H.C., Dasdan, A., Hsiao, R.-L., Parker, D.S. Jr.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD Conference, pp. 1029–1040 (2007)

  12. Culler, D.E.: Planetlab: an open, community-driven infrastructure for experimental planetary-scale services. In: USENIX Symposium on Internet Technologies and Systems (2003)

  13. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

  14. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  15. Deshpande, A., Hellerstein, J.M.: Lifting the burden of history from adaptive query processing. In: Proc. of 30th VLDB Conf., pp. 948–959 (2004)

  16. Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)

    Article  Google Scholar 

  17. Eugster, P.Th., Felber, P.A., Guerraoui, R., Kermarrec, A.-M.: The many faces of publish/subscribe. ACM Comput. Surv. 35(2), 114–131 (2003)

    Article  Google Scholar 

  18. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Mateo (2003)

    Google Scholar 

  19. Gounaris, A., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Self monitoring query execution for adaptive query processing. Data Knowl. Eng. 51(3), 325–348 (2004)

    Article  Google Scholar 

  20. Gounaris, A., Paton, N.W., Sakellariou, R., Fernandes, A.A.A.: Adapting to changing resource performance in grid query processing. In: 1st Int. Workshop on Data Management in Grids, pp. 30–44. Springer, Berlin (2005)

    Google Scholar 

  21. Gounaris, A., Sakellariou, R., Paton, N.W., Fernandes, A.A.A.: A novel approach to resource scheduling for parallel query processing on computational grids. Distrib. Parallel Databases 19(2–3), 87–106 (2006)

    Article  Google Scholar 

  22. Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: Proc. SIGMOD, pp. 102–111 (1990)

  23. Hameurlain, A., Morvan, F.: CPU and incremental memory allocation in dynamic parallelization of SQL queries. Parallel Comput. 28(4), 525–556 (2002)

    Article  MATH  Google Scholar 

  24. Hellerstein, J.M., Stonebraker, M.: Predicate migration: optimizing queries with expensive predicates. In: SIGMOD Conference, pp. 267–276 (1993)

  25. Ives, Z.: Efficient query processing for data integration. PhD thesis, University of Washington (2002)

  26. Ives, Z., Florescu, D., Friedman, M., Levy, A., Weld, D.: An adaptive query execution system for data integration. In: Proc. of ACM SIGMOD 1999, pp. 299–310 (1999)

  27. Ives, Z., Halevy, A., Weld, D.: Adapting to source properties in processing data integration queries. In: Proc. of ACM SIGMOD, pp. 395–406 (2004)

  28. Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for db2. In: Proc. of ACM SIGMOD, pp. 524–532 (2002)

  29. Kabra, N., DeWitt, D.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: Proc. of ACM SIGMOD, pp. 106–117 (1998)

  30. Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE, pp. 26–35 (2007)

  31. Liu, D.T., Franklin, M.J.: GridDB: a data-centric overlay for scientific grids. In: Proc. VLDB, pp. 600–611. Morgan Kaufmann, San Mateo (2004)

    Chapter  Google Scholar 

  32. Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proc. ACM SIGMOD, pp. 659–670 (2004)

  33. Narayanan, S., Kurc, T.M., Saltz, J.: Database support for data-driven scientific applications in the grid. Parallel Process. Lett. 13(2), 245–271 (2003)

    Article  MathSciNet  Google Scholar 

  34. Ng, K., Wang, Z., Muntz, R., Nittel, S.: Dynamic query re-optimization. In: Proc. of 11th SSDBM, pp. 264–273 (1999)

  35. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)

  36. Oram, A.: Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly (2001)

  37. Ozcan, F., Nural, S., Koksal, P., Evrendilek, C., Dogac, A.: Dynamic query optimization in multidatabases. IEEE Data Eng. Bull. 20(3), 38–45 (1997)

    Google Scholar 

  38. Paton, N.W., Chávez, J.B., Chen, M., Raman, V., Swart, G., Narang, I., Yellin, D.M., Fernandes, A.A.A.: Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options. VLDB J. (2008). doi:10.1007/s00778-007-0090-x

    Google Scholar 

  39. Porto, F., da Silva, V.F.V., Dutra, M.L., Schulze, B.: An adaptive distributed query processing grid service. In: Proc. 1st Data Management in Grids Workshop, pp. 45–57. Springer, Berlin (2005)

    Google Scholar 

  40. Raman, V., Han, W., Narang, I.: Parallel querying with non-dedicated computers. In: Proc. VLDB, pp. 61–72 (2005)

  41. Raman, V., Raman, B., Hellerstein, J.: Online dynamic reordering for interactive data processing. In: Proc. of 25th VLDB Conference, pp. 709–720 (1999)

  42. Shah, M., Hellerstein, J., Chandrasekaran, S., Franklin, M.: Flux: an adaptive partitioning operator for continuous query systems. In: Proc. of ICDE, pp. 25–36 (2003)

  43. Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly available fault-tolerant, parallel dataflows. In: Proc. SIGMOD, pp. 827–838 (2004)

  44. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. Intl. J. High Perform. Comput. Appl. 17(4), 353–368 (2003)

    Article  Google Scholar 

  45. Smith, J., Watson, P.: Fault-tolerance in distributed query processing. In: Proc. 9th IDEAS, pp. 329–338 (2005)

  46. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB, pp. 355–366 (2006)

  47. Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Mariposa, A.Yu.: A wide-area distributed database system. VLDB J. 5(1), 48–63 (1996)

    Article  Google Scholar 

  48. Tian, F., DeWitt, D.: Tuple routing strategies for distributed eddies. In: Proc. of 29th VLDB Conference, pp. 333–344 (2003)

  49. Wang, X., Burns, R., Terzis, A.: Throughput-optimized, global-scale join processing in scientific federations. In: NETB’07: Proceedings of the 3rd USENIX International Workshop on Networking Meets Databases, pp. 1–6. USENIX Association, Berkeley (2007)

    Google Scholar 

  50. Wang, X., Burns, R.C., Terzis, A., Deshpande, A.: Network-aware join processing in global-scale database federations. In: ICDE, pp. 586–595 (2008)

  51. Xing, Y., Zdonik, S., Hwang, J.-H.: Dynamic load distribution in the Borealis stream processor. In: Proc ICDE, pp. 791–802 (2005)

  52. Yu, M.J., Sheu, P.C.-Y.: Adaptive join algorithms in dynamic distributed databases. Distrib. Parallel Databases 5(1), 5–30 (1997)

    Article  Google Scholar 

  53. Zhou, Y., Ooi, B.C., Tan, K.-L., Tok, W.H.: An adaptable distributed query processing architecture. Data Knowl. Eng. 53(3), 283–309 (2005)

    Article  Google Scholar 

  54. Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: Proc. ACM SIGMOD, pp. 431–442 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Gounaris.

Additional information

Communicated by Ahmed K. Elmagarmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gounaris, A., Smith, J., Paton, N.W. et al. Adaptive workload allocation in query processing in autonomous heterogeneous environments. Distrib Parallel Databases 25, 125–164 (2009). https://doi.org/10.1007/s10619-008-7032-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-008-7032-5

Keywords

Navigation