Skip to main content
Log in

LSShare: an efficient multiple query optimization system in the cloud

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Multiple query optimization (MQO) in the cloud has become a promising research direction due to the popularity of cloud computing, which runs massive data analysis queries (jobs) routinely. These CPU/IO intensive analysis queries are complex and time-consuming but share common components. It is challenging to detect, share and reuse the common components among thousands of SQL-like queries. Previous solutions to MQO, heuristic or genetic based, are not appropriate for the large growing query set situation. In this paper, we develop a sharing system called LSShare using our proposed Lineage-Signature approach. By LSShare, we can efficiently solve the MQO problem in a recurring query set situation in the cloud. Our system has been prototyped in a distributed system built for massive data analysis based on Alibaba’s cloud computing platform (http://www.alibaba.com/). Experimental results on real data sets demonstrate the efficiency and effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://odps.aliyun.com/

  2. http://hadoop.apache.org/

  3. http://hive.apache.org/

  4. http://www.antlr.org/

  5. http://www.tpc.org/default.asp

  6. http://www.aliloan.com/

  7. http://www.taobao.com

References

  1. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In EDBT (2010)

  2. Agrawal, P., Kifer, D., Olston, C.: Scheduling shared scans of large data files. Proc. VLDB Endow. 1(1), 958–969 (2008)

    Article  Google Scholar 

  3. Astrahan, M.M., Blasgen, M.W., Chamberlin, D.D., Eswaran, K.P., Gray, J.N., Griffiths, P.P., King, W.F., Lorie, R.A., McJones, P.R., Mehl, J.W., Putzolu, G.R., Traiger, I.L., Wade, B.W., Watson, V.: System r: relational approach to database management. ACM Trans. Database Syst. 1(2), 97–137 (1976)

    Article  Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In PODS (2002)

  5. Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In SoCC (2010)

  6. Bayir, M.A., Toroslu, I.H., Cosar, A.: Genetic algorithm for the multiple-query optimization problem. Trans. Syst. Man Cyber. Part C 37(1), 147–153 (2007)

    Article  Google Scholar 

  7. Bruno, N., Agarwal, S., Kandula, S., Shi, B., Wu, M.C., Zhou, J.: Recurring job optimization in scope. In SIGMOD (2012)

  8. Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: efficient iterative data processing on large clusters. Proc. VLDB Endow. 3(1–2), 285–296 (2010)

    Article  Google Scholar 

  9. Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)

    Article  Google Scholar 

  10. Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In VLDB (2002)

  11. Chen, F.C.F., Dunham, M.H.: Common subexpression processing in multiple-query processing. IEEE Trans. Knowl. Data Eng. 10(3), 493–499 (1998)

    Article  Google Scholar 

  12. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In NSDI (2010)

  13. Cosar, A., Lim, E.P., Srivastava, J.: Multiple query optimization with depth-first branch-and-bound and dynamic query ordering. In CIKM (1993)

  14. Dalvi, N.N., Sanghai, S.K., Roy, P., Sudarshan, S.: Pipelining in multi-query optimization. In PODS (2001)

  15. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  16. Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)

    Article  Google Scholar 

  17. Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. Proc. VLDB Endow. 5(6), 586–597 (2012a)

    Article  Google Scholar 

  18. Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs in pig. In SIGMOD (2012b)

  19. Finkelstein, S.: Common expression analysis in database applications. In SIGMOD (1982)

  20. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In ICDE (2011)

  21. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In CIDR (2011)

  22. Kalnis, P., Papadias, D.: Multi-query optimization for on-line analytical processing. Inf. Syst. 28(5), 457–473 (2003)

    Article  MATH  Google Scholar 

  23. Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.: Ysmart: Yet another sql-to-mapreduce translator. In ICDCS (2011)

  24. Lehner, W., Cochrane, R., Pirahesh, H., Zaharioudakis, M.: Fast refresh using mass query optimization. In ICDE (2001)

  25. Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialized view selection and maintenance using multi-query optimization. In SIGMOD (2001)

  26. Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in mapreduce. Proc. VLDB Endow. 3(1–2), 1–12 (2010)

  27. Park, J., Segev, A.: Using common subexpressions to optimize multiple queries. In ICDE (1988)

  28. Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. SIGMOD Rec. 29(2), 249–260 (2000)

    Article  Google Scholar 

  29. Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)

    Article  Google Scholar 

  30. Sellis, T.K., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)

    Article  Google Scholar 

  31. Shim, K., Sellis, T.K., Nau, D.: Improvements on a heuristic algorithm for multiple-query optimization. Data. Knowl. Eng. 12(2), 197–222 (1994)

    Article  Google Scholar 

  32. Silva, Y.N., Larson, P.A., Zhou, J.: Exploiting common subexpressions for cloud query processing. In ICDE (2012)

  33. Subramanian, S.N., Venkataraman, S.: Cost-based optimization of decision support queries using transient-views. In SIGMOD (1998)

  34. Zhou, J., Larson, P.A., Freytag, J.C., Lehner, W.: Efficient exploitation of similar subexpressions for query processing. In SIGMOD (2007)

  35. Zhou, J., Larson, P.A., Chaiken, R.: Incorporating partitioning and parallel plans into the scope optimizer. In ICDE (2010)

Download references

Acknowledgments

This work was supported by the NSFC (No. 61202025, 61373031, 61373156,61272099, and 61261160502), the Program for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT), the Scientific Innovation Act of STCSM(No. 13511504200), Singapore NRF (CREATE E2S2), the EU FP7 CLIMBER project (No. PIRSES-GA-2012-318939), the State High-Tech Development Plan (2013AA01A601), Microsoft Research Asia (the Urban Informatics Research Grant) and STCSM Grant (No. 12ZR1414900).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Yao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, X., Yao, B., Guo, M. et al. LSShare: an efficient multiple query optimization system in the cloud. Distrib Parallel Databases 32, 583–605 (2014). https://doi.org/10.1007/s10619-014-7150-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-014-7150-1

Keywords

Navigation