Skip to main content

A Vision for Next Generation Query Processors and an Associated Research Agenda

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5697))

Abstract

Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for ACID properties. However, the emergence of new computing paradigms, such as grid and cloud computing, the increasingly large volumes of data commonly processed, the need to support data driven research, intensive data analysis and new scenarios, such as processing data streams on the fly or querying web services, the fact that the metadata fed to optimizers are often missing at compile time, and the growing interest in novel optimization criteria, such as monetary cost or energy consumption, create a unique set of new requirements for query processing systems. These requirements cannot be met by modern techniques in their entirety, although interesting solutions and efficient tools have already been developed for some of them in isolation. Next generation query processors are expected to combine features addressing all of these issues, and, consequently, lie at the confluence of several research initiatives. This paper aims to present a vision for such processors, to explain their functionality requirements, and to discuss the open issues, along with their challenges.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D.J.: Data management in the cloud: Limitations and opportunities. IEEE Data Eng. Bull. 32(1), 3–12 (2009)

    MathSciNet  Google Scholar 

  2. Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F., Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T., Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S., Weikum, G.: The claremont report on database research. SIGMOD Record 37(3), 9–19 (2008)

    Article  Google Scholar 

  3. Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC 2003. LNCS, vol. 2910, pp. 467–482. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 261–272. ACM, New York (2000)

    Google Scholar 

  5. Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. 33(1) (2008)

    Google Scholar 

  6. Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst. 6(4), 602–625 (1981)

    Article  MATH  Google Scholar 

  7. Buyya, R., Abramson, D., Giddy, J., Stockinger, H.: Economic models for resource management and scheduling in grid computing. Concurrency and Computation: Practice and Experience 14(13-15), 1507–1542 (2002)

    Article  MATH  Google Scholar 

  8. Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43. ACM Press, New York (1998)

    Chapter  Google Scholar 

  9. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing. In: CIDR (2003)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Foundations and Trends in Databases 1(1), 1–140 (2007)

    Article  MATH  Google Scholar 

  13. DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  14. DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J.F., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. PVLDB 1(1), 28–41 (2008)

    Google Scholar 

  15. Diao, Y., Hellerstein, J.L., Storm, A.J., Surendra, M., Lightstone, S., Parekh, S.S., Garcia-Arellano, C.: Incorporating cost of control into the design of a load balancing controller. In: IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 376–387 (2004)

    Google Scholar 

  16. Diao, Y., Wu, C.W., Hellerstein, J.L., Storm, A.J., Surendra, M., Lightstone, S., Parekh, S., Garcia-Arellano, C., Carroll, M., Chu, L., Colaco, J.: Comparative studies of load balancing with control and optimization techniques. In: Proceedings of the American Control Conference, pp. 1484–1490 (2005)

    Google Scholar 

  17. Epstein, R.S., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: Lowenthal, E.I., Dale, N.B. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 169–180. ACM, New York (1978)

    Google Scholar 

  18. Garofalakis, M.N., Ioannidis, Y.E.: Parallel query scheduling and optimization with time- and space-shared resources. In: VLDB, pp. 296–305 (1997)

    Google Scholar 

  19. Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)

    Article  Google Scholar 

  20. Gounaris, A., Sakellariou, R., Paton, N.W., Fernandes, A.A.A.: A novel approach to resource scheduling for parallel query processing on computational grids. Distributed and Parallel Databases 19(2-3), 87–106 (2006)

    Article  Google Scholar 

  21. Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A., Watson, P.: Adaptive workload allocation in query processing in autonomous heterogeneous environments. Distrib. Parallel Databases 25(3), 125–164 (2009)

    Article  Google Scholar 

  22. Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: A control theoretical approach to self-optimizing block transfer in web service grids. TAAS 3(2) (2008)

    Google Scholar 

  23. Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: Robust runtime optimization of data transfer in queries over web services. In: Proc. of ICDE, pp. 596–605 (2008)

    Google Scholar 

  24. Gounaris, A., Yfoulis, C.A., Paton, N.W.: An efficient load balancing LQR controller in parallel databases queries under random perturbations. In: 3rd IEEE Multi-conference on Systems and Control, MSC 2009 (2009)

    Google Scholar 

  25. Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–170 (1993)

    Article  Google Scholar 

  26. Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, pp. 287–298 (1999)

    Google Scholar 

  27. Hameurlain, A., Morvan, F., El Samad, M.: Large Scale Data management in Grid Systems: a Survey. In: IEEE International Conference on Information and Communication Technologies: from Theory to Applications, ICTTA (2008)

    Google Scholar 

  28. Harizopoulos, S., Shah, M., Ranganathan, P.: Energy efficiency: The new holy grail of data management systems research. In: CIDR (2009)

    Google Scholar 

  29. Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. John Wiley & Sons, Chichester (2004)

    Book  Google Scholar 

  30. Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)

    Article  Google Scholar 

  31. Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, pp. 299–310. ACM Press, New York (1999)

    Chapter  Google Scholar 

  32. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1), 41–50 (2003)

    Article  Google Scholar 

  33. Kephart, J.O., Das, R.: Achieving self-management via utility functions. IEEE Internet Computing 11(1), 40–48 (2007)

    Article  Google Scholar 

  34. Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)

    Article  Google Scholar 

  35. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)

    Article  Google Scholar 

  36. Lang, W., Patel, J.M.: Towards eco-friendly database management systems. In: CIDR (2009)

    Google Scholar 

  37. Luo, G., Ellmann, C., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 252–262. ACM, New York (2002)

    Chapter  Google Scholar 

  38. Lynden, S., Mukherjee, A., Hume, A.C., Fernandes, A.A.A., Paton, N.W., Sakellariou, R., Watson, P.: The design and implementation of ogsa-dqp: A service-based distributed query processor. Future Generation Comp. Syst. 25(3), 224–236 (2009)

    Article  Google Scholar 

  39. Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for distributed queries. In: VLDB 1986 Twelfth International Conference on Very Large Data Bases, pp. 149–159. Morgan Kaufmann, San Francisco (1986)

    Google Scholar 

  40. Ng, K.W., Wang, Z., Muntz, R.R., Nittel, S.: Dynamic query re-optimization. In: SSDBM, pp. 264–273 (1999)

    Google Scholar 

  41. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)

    Google Scholar 

  42. Ouzzani, M., Bouguettaya, A.: Query processing and optimization on the web. Distributed and Parallel Databases 15(3), 187–218 (2004)

    Article  Google Scholar 

  43. Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)

    Google Scholar 

  44. Pacitti, E., Valduriez, P., Mattoso, M.: Grid data management: Open problems and new issues. J. Grid Comput. 5(3), 273–281 (2007)

    Article  Google Scholar 

  45. Papadimitriou, C.H., Yannakakis, M.: Multiobjective query optimization. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York (2001)

    Google Scholar 

  46. Paton, N.W., Aragão, M.A.T., Lee, K., Fernandes, A.A.A., Sakellariou, R.: Optimizing utility in cloud computing through autonomic workload execution. IEEE Data Eng. Bull. 32(1), 51–58 (2009)

    Google Scholar 

  47. Sakellariou, R., Zhao, H.: A hybrid heuristic for DAG scheduling on heterogeneous systems. In: 18th International Parallel and Distributed Processing Symposium (IPDPS 2004). IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  48. Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly-available, fault-tolerant, parallel dataflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 827–838. ACM, New York (2004)

    Google Scholar 

  49. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. International Journal of High Performance Computing Applications 17(4), 353–367 (2003)

    Article  MATH  Google Scholar 

  50. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB, pp. 355–366 (2006)

    Google Scholar 

  51. Stonebraker, M.: The case for shared nothing. IEEE Data Engineering Bulletin 9(1), 4–9 (1986)

    Google Scholar 

  52. Stonebraker, M.: Technical perspective - one size fits all: an idea whose time has come and gone. Commun. ACM 51(12), 76 (2008)

    Article  Google Scholar 

  53. Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. VLDB J. 5(1), 48–63 (1996)

    Article  Google Scholar 

  54. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: VLDB, pp. 333–344 (2003)

    Google Scholar 

  55. Venugopal, S., Buyya, R., Ramamohanarao, K.: A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput. Surv. 38(1) (2006)

    Google Scholar 

  56. Wang, C., Chen, M.-S.: On the complexity of distributed query optimization. IEEE Trans. Knowl. Data Eng. 8(4), 650–662 (1996)

    Article  Google Scholar 

  57. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)

    Google Scholar 

  58. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, pp. 29–42 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gounaris, A. (2009). A Vision for Next Generation Query Processors and an Associated Research Agenda. In: Hameurlain, A., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2009. Lecture Notes in Computer Science, vol 5697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03715-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03715-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03714-6

  • Online ISBN: 978-3-642-03715-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics