Abstract
Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for ACID properties. However, the emergence of new computing paradigms, such as grid and cloud computing, the increasingly large volumes of data commonly processed, the need to support data driven research, intensive data analysis and new scenarios, such as processing data streams on the fly or querying web services, the fact that the metadata fed to optimizers are often missing at compile time, and the growing interest in novel optimization criteria, such as monetary cost or energy consumption, create a unique set of new requirements for query processing systems. These requirements cannot be met by modern techniques in their entirety, although interesting solutions and efficient tools have already been developed for some of them in isolation. Next generation query processors are expected to combine features addressing all of these issues, and, consequently, lie at the confluence of several research initiatives. This paper aims to present a vision for such processors, to explain their functionality requirements, and to discuss the open issues, along with their challenges.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abadi, D.J.: Data management in the cloud: Limitations and opportunities. IEEE Data Eng. Bull. 32(1), 3–12 (2009)
Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F., Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T., Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S., Weikum, G.: The claremont report on database research. SIGMOD Record 37(3), 9–19 (2008)
Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC 2003. LNCS, vol. 2910, pp. 467–482. Springer, Heidelberg (2003)
Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 261–272. ACM, New York (2000)
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. 33(1) (2008)
Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst. 6(4), 602–625 (1981)
Buyya, R., Abramson, D., Giddy, J., Stockinger, H.: Economic models for resource management and scheduling in grid computing. Concurrency and Computation: Practice and Experience 14(13-15), 1507–1542 (2002)
Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43. ACM Press, New York (1998)
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing. In: CIDR (2003)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Foundations and Trends in Databases 1(1), 1–140 (2007)
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J.F., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. PVLDB 1(1), 28–41 (2008)
Diao, Y., Hellerstein, J.L., Storm, A.J., Surendra, M., Lightstone, S., Parekh, S.S., Garcia-Arellano, C.: Incorporating cost of control into the design of a load balancing controller. In: IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 376–387 (2004)
Diao, Y., Wu, C.W., Hellerstein, J.L., Storm, A.J., Surendra, M., Lightstone, S., Parekh, S., Garcia-Arellano, C., Carroll, M., Chu, L., Colaco, J.: Comparative studies of load balancing with control and optimization techniques. In: Proceedings of the American Control Conference, pp. 1484–1490 (2005)
Epstein, R.S., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: Lowenthal, E.I., Dale, N.B. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 169–180. ACM, New York (1978)
Garofalakis, M.N., Ioannidis, Y.E.: Parallel query scheduling and optimization with time- and space-shared resources. In: VLDB, pp. 296–305 (1997)
Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)
Gounaris, A., Sakellariou, R., Paton, N.W., Fernandes, A.A.A.: A novel approach to resource scheduling for parallel query processing on computational grids. Distributed and Parallel Databases 19(2-3), 87–106 (2006)
Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A., Watson, P.: Adaptive workload allocation in query processing in autonomous heterogeneous environments. Distrib. Parallel Databases 25(3), 125–164 (2009)
Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: A control theoretical approach to self-optimizing block transfer in web service grids. TAAS 3(2) (2008)
Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: Robust runtime optimization of data transfer in queries over web services. In: Proc. of ICDE, pp. 596–605 (2008)
Gounaris, A., Yfoulis, C.A., Paton, N.W.: An efficient load balancing LQR controller in parallel databases queries under random perturbations. In: 3rd IEEE Multi-conference on Systems and Control, MSC 2009 (2009)
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–170 (1993)
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, pp. 287–298 (1999)
Hameurlain, A., Morvan, F., El Samad, M.: Large Scale Data management in Grid Systems: a Survey. In: IEEE International Conference on Information and Communication Technologies: from Theory to Applications, ICTTA (2008)
Harizopoulos, S., Shah, M., Ranganathan, P.: Energy efficiency: The new holy grail of data management systems research. In: CIDR (2009)
Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. John Wiley & Sons, Chichester (2004)
Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, pp. 299–310. ACM Press, New York (1999)
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1), 41–50 (2003)
Kephart, J.O., Das, R.: Achieving self-management via utility functions. IEEE Internet Computing 11(1), 40–48 (2007)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)
Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)
Lang, W., Patel, J.M.: Towards eco-friendly database management systems. In: CIDR (2009)
Luo, G., Ellmann, C., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 252–262. ACM, New York (2002)
Lynden, S., Mukherjee, A., Hume, A.C., Fernandes, A.A.A., Paton, N.W., Sakellariou, R., Watson, P.: The design and implementation of ogsa-dqp: A service-based distributed query processor. Future Generation Comp. Syst. 25(3), 224–236 (2009)
Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for distributed queries. In: VLDB 1986 Twelfth International Conference on Very Large Data Bases, pp. 149–159. Morgan Kaufmann, San Francisco (1986)
Ng, K.W., Wang, Z., Muntz, R.R., Nittel, S.: Dynamic query re-optimization. In: SSDBM, pp. 264–273 (1999)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)
Ouzzani, M., Bouguettaya, A.: Query processing and optimization on the web. Distributed and Parallel Databases 15(3), 187–218 (2004)
Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)
Pacitti, E., Valduriez, P., Mattoso, M.: Grid data management: Open problems and new issues. J. Grid Comput. 5(3), 273–281 (2007)
Papadimitriou, C.H., Yannakakis, M.: Multiobjective query optimization. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York (2001)
Paton, N.W., Aragão, M.A.T., Lee, K., Fernandes, A.A.A., Sakellariou, R.: Optimizing utility in cloud computing through autonomic workload execution. IEEE Data Eng. Bull. 32(1), 51–58 (2009)
Sakellariou, R., Zhao, H.: A hybrid heuristic for DAG scheduling on heterogeneous systems. In: 18th International Parallel and Distributed Processing Symposium (IPDPS 2004). IEEE Computer Society, Los Alamitos (2004)
Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly-available, fault-tolerant, parallel dataflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 827–838. ACM, New York (2004)
Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. International Journal of High Performance Computing Applications 17(4), 353–367 (2003)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB, pp. 355–366 (2006)
Stonebraker, M.: The case for shared nothing. IEEE Data Engineering Bulletin 9(1), 4–9 (1986)
Stonebraker, M.: Technical perspective - one size fits all: an idea whose time has come and gone. Commun. ACM 51(12), 76 (2008)
Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. VLDB J. 5(1), 48–63 (1996)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: VLDB, pp. 333–344 (2003)
Venugopal, S., Buyya, R., Ramamohanarao, K.: A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput. Surv. 38(1) (2006)
Wang, C., Chen, M.-S.: On the complexity of distributed query optimization. IEEE Trans. Knowl. Data Eng. 8(4), 650–662 (1996)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, pp. 29–42 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gounaris, A. (2009). A Vision for Next Generation Query Processors and an Associated Research Agenda. In: Hameurlain, A., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2009. Lecture Notes in Computer Science, vol 5697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03715-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-03715-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03714-6
Online ISBN: 978-3-642-03715-3
eBook Packages: Computer ScienceComputer Science (R0)