Abstract
In this paper, we study and optimize the aggregate query processing in a highly distributed Cloud Data Warehouse, where each database stores a subset of relational data in a star-schema. Existing aggregate query processing algorithms focus on optimizing various query operations but give less importance to communication cost overhead (Two-phase algorithm). However, in cloud architectures, the communication cost overhead is an important factor in query processing. Thus, we consider communication overhead to improve the distributed query processing in such cloud data warehouses. We then design query-processing algorithms by analyzing aggregate operation and eliminating most of the sort and group-by operations with the help of integrity constraints and our proposed storage structures, PK-map and Tuple-index-map. Extensive experiments on PlanetLab cloud machines validate the effectiveness of our proposed framework in improving the response time, reducing node-to-node interdependency, minimizing communication overhead, and reducing database table access required for aggregate query.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kemper, A., Neumann, T.: Hyper: A hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: IEEE 27th International Conference on Data Engineering (ICDE), Hannover, Germany, pp. 195–206 (2011)
Vlachou, A., Doulkeridis, C., Norvag, K., Kotidis, Y.: Peer-to-Peer Query Processing over Multidimensional Data. Springer (2012)
Curino, C., Evan, P.C.J., Raluca, A.P., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational Cloud: A Database Service for the cloud. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), California, USA, pp. 235–240 (2011)
Simmen, D., Shekita, E., Malkemus, T.: Fundamental Techniques for Order Optimization. In: ACM SIGMOD International Conference on Management of Data, Montreal, Canada, vol. 25, pp. 57–67 (1996)
Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)
Xin, D., Han, J., Li, X., Benjamin, W.W.: Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 476–487 (2003)
Boukhelef, D., Kitagawa, H.: Efficient Management of Multidimensional Data in Structured Peer-to-Peer Overlays. In: 35th International Conference on Very Large Data Bases (VLDB), vol. 35. Lyon, France (2009)
Graefe, G.: New algorithms for join and grouping operations. Journal Computer Science - Research and Development 27(1), 3–27 (2012)
Soundararajan, G., Lupei, D., Ghanbari, S., Adrian, D.P., Chen, J., Amza, C.: Dynamic Resource Allocation for Database Servers Running on Virtual Storage. In: 7th USENIX Conference on File and Storage Technologies (FAST), San Francisco, California, USA, pp. 71–84 (2009)
Garcia-Molina, H., Salem, K.: Main Memory Database Systems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 4(6), 509–516 (1992)
Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: ACM SIGMOD International Conference on Management of Data, Providence, USA, pp. 1–2 (2009)
Peterson, L., Roscoe, T.: The design principles of PlanetLab. In: ACM SIGOPS Operating Systems Review, New York, USA, pp. 11–16 (2006)
Planetlab Cloud, http://www.planet-lab.org/
Chaudhuri, S., Shim, K.: Including Group-By in Query Optimization. In: 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 354–366 (1994)
Survey, http://www.oracle.com/us/products/database/high-performance-data-warehousing-1869944.pdf
Agarwal, S., Agrawal, R., Prasad, M.D., Gupta, A., Jeffrey, F.N., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimentional Aggregates. In: 22nd International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, vol. 22, pp. 506–521 (1996)
Kurunji, S., Ge, T., Liu, B., Chen, C.X.: Communication Cost Optimization for Cloud Data Warehouse Queries. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings (CloudCom), Taipei, Taiwan, pp. 512–519 (2012)
Srikanth, B., Li, H., Unmesh, J., Zhu, Y., Vince, L., Thierry, C.: Adaptive and Big Data Scale Parallel Execution in Oracle. International Journal on Very Large Data Bases (VLDB) 6(11), 1102–1113 (2013)
Neumann, T., Moerkotte, G.: A Combined Framework for Grouping and Order Optimization. In: 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, vol. 30, pp. 960–971 (2004)
TPC-H benchmark, http://www.tpc.org/tpch/spec/tpch2.14.4.pdf
Teradata, http://www.teradata.com/white-papers/Teradata-Aggregate-Designer-eb6110
Weipeng, P.Y., Per-Ake, L.: Eager Aggregation and Lazy Aggregation. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 345–357 (1995)
Hasan, W., Motwani, R.: Coloring Away Communication in Parallel Query Optimization. In: 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, pp. 239–250 (1995)
Wang, X., Cherniack, M.: Avoiding Sorting and Grouping in Processing Queries. In: 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, vol. 29, pp. 826–837 (2003)
Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-Sharing-Aware Query Processing. International Journal on Very Large Data Bases (VLDB) 21(3), 411–436 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kurunji, S., Ge, T., Fu, X., Liu, B., Kumar, A., Chen, C.X. (2014). Optimizing Aggregate Query Processing in Cloud Data Warehouses. In: Hameurlain, A., Dang, T.K., Morvan, F. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2014. Lecture Notes in Computer Science, vol 8648. Springer, Cham. https://doi.org/10.1007/978-3-319-10067-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-10067-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10066-1
Online ISBN: 978-3-319-10067-8
eBook Packages: Computer ScienceComputer Science (R0)