Abstract
Multiple query optimization (MQO) in the cloud has become a promising research direction due to the popularity of cloud computing, which runs massive data analysis queries (jobs) routinely. These CPU/IO intensive analysis queries are complex and time-consuming but share common components. It is challenging to detect, share and reuse the common components among thousands of SQL-like queries. Previous solutions to MQO, heuristic or genetic based, are not appropriate for the large growing query set situation. In this paper, we develop a sharing system called LSShare using our proposed Lineage-Signature approach. By LSShare, we can efficiently solve the MQO problem in a recurring query set situation in the cloud. Our system has been prototyped in a distributed system built for massive data analysis based on Alibaba’s cloud computing platform (http://www.alibaba.com/). Experimental results on real data sets demonstrate the efficiency and effectiveness of the proposed approach.
Similar content being viewed by others
References
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In EDBT (2010)
Agrawal, P., Kifer, D., Olston, C.: Scheduling shared scans of large data files. Proc. VLDB Endow. 1(1), 958–969 (2008)
Astrahan, M.M., Blasgen, M.W., Chamberlin, D.D., Eswaran, K.P., Gray, J.N., Griffiths, P.P., King, W.F., Lorie, R.A., McJones, P.R., Mehl, J.W., Putzolu, G.R., Traiger, I.L., Wade, B.W., Watson, V.: System r: relational approach to database management. ACM Trans. Database Syst. 1(2), 97–137 (1976)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In PODS (2002)
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In SoCC (2010)
Bayir, M.A., Toroslu, I.H., Cosar, A.: Genetic algorithm for the multiple-query optimization problem. Trans. Syst. Man Cyber. Part C 37(1), 147–153 (2007)
Bruno, N., Agarwal, S., Kandula, S., Shi, B., Wu, M.C., Zhou, J.: Recurring job optimization in scope. In SIGMOD (2012)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: efficient iterative data processing on large clusters. Proc. VLDB Endow. 3(1–2), 285–296 (2010)
Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)
Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In VLDB (2002)
Chen, F.C.F., Dunham, M.H.: Common subexpression processing in multiple-query processing. IEEE Trans. Knowl. Data Eng. 10(3), 493–499 (1998)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In NSDI (2010)
Cosar, A., Lim, E.P., Srivastava, J.: Multiple query optimization with depth-first branch-and-bound and dynamic query ordering. In CIKM (1993)
Dalvi, N.N., Sanghai, S.K., Roy, P., Sudarshan, S.: Pipelining in multi-query optimization. In PODS (2001)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)
Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. Proc. VLDB Endow. 5(6), 586–597 (2012a)
Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs in pig. In SIGMOD (2012b)
Finkelstein, S.: Common expression analysis in database applications. In SIGMOD (1982)
He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In ICDE (2011)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In CIDR (2011)
Kalnis, P., Papadias, D.: Multi-query optimization for on-line analytical processing. Inf. Syst. 28(5), 457–473 (2003)
Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.: Ysmart: Yet another sql-to-mapreduce translator. In ICDCS (2011)
Lehner, W., Cochrane, R., Pirahesh, H., Zaharioudakis, M.: Fast refresh using mass query optimization. In ICDE (2001)
Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialized view selection and maintenance using multi-query optimization. In SIGMOD (2001)
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in mapreduce. Proc. VLDB Endow. 3(1–2), 1–12 (2010)
Park, J., Segev, A.: Using common subexpressions to optimize multiple queries. In ICDE (1988)
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. SIGMOD Rec. 29(2), 249–260 (2000)
Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)
Sellis, T.K., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)
Shim, K., Sellis, T.K., Nau, D.: Improvements on a heuristic algorithm for multiple-query optimization. Data. Knowl. Eng. 12(2), 197–222 (1994)
Silva, Y.N., Larson, P.A., Zhou, J.: Exploiting common subexpressions for cloud query processing. In ICDE (2012)
Subramanian, S.N., Venkataraman, S.: Cost-based optimization of decision support queries using transient-views. In SIGMOD (1998)
Zhou, J., Larson, P.A., Freytag, J.C., Lehner, W.: Efficient exploitation of similar subexpressions for query processing. In SIGMOD (2007)
Zhou, J., Larson, P.A., Chaiken, R.: Incorporating partitioning and parallel plans into the scope optimizer. In ICDE (2010)
Acknowledgments
This work was supported by the NSFC (No. 61202025, 61373031, 61373156,61272099, and 61261160502), the Program for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT), the Scientific Innovation Act of STCSM(No. 13511504200), Singapore NRF (CREATE E2S2), the EU FP7 CLIMBER project (No. PIRSES-GA-2012-318939), the State High-Tech Development Plan (2013AA01A601), Microsoft Research Asia (the Urban Informatics Research Grant) and STCSM Grant (No. 12ZR1414900).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ge, X., Yao, B., Guo, M. et al. LSShare: an efficient multiple query optimization system in the cloud. Distrib Parallel Databases 32, 583–605 (2014). https://doi.org/10.1007/s10619-014-7150-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-014-7150-1