Optimizing Pipelined Execution for Distributed In-Memory OLAP System

Wang, Li; Zhang, Lei; Yu, Chengcheng; Zhou, Aoying

doi:10.1007/978-3-662-43984-5_15

Li Wang²¹,
Lei Zhang²¹,
Chengcheng Yu²¹ &
…
Aoying Zhou²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1021 Accesses

Abstract

In the coming big data era, the demand for data analysis capability in real applications is growing at amazing pace. The memory’s increasing capacity and decreasing price make it possible and attractive for the distributed OLAP system to load all the data into memory and thus significantly improve the data processing performance. In this paper, we model the performance of pipelined execution in distributed in-memory OLAP system and figure out that the data communication among the computation nodes, which is achieved by data exchange operator, is the performance bottleneck. Consequently, we explore the pipelined data exchange in depth and give a novel solution that is efficient, scalable, and skew-resilient. Experimental results show the effectiveness of our proposals by comparing with state-of-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Other queries are also evaluated in our experiment and result in similar results. Due to space limitation, these results are omitted.

References

Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 671–682. ACM (2006)
Google Scholar
Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064–1075 (2012)
Article Google Scholar
Barlow, M.: Real-Time Big Data Analytics: Emerging Architecture. O’Reilly Media Inc., Sebastopol (2013)
Google Scholar
Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: hyper-pipelining query execution. CIDR 5, 225–237 (2005)
Google Scholar
Borthakur, D.: The hadoop distributed file system: architecture and design (2007)
Google Scholar
Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 339–350. VLDB Endowment (2007)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: NSDI, vol. 10, pp. 20 (2010)
Google Scholar
Graefe, G.: Volcano-an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Article Google Scholar
Kumar, V.S., Tucek, J., Wylie, J.J., Krevat, E., Ganger, G.R.: Application-level flow scheduling for efficient collective data transfers (2012)
Google Scholar
Plattner, H.: A common database approach for oltp and olap using an in-memory column database. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 1–2. ACM (2009)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Proceedings of the Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the NSDI, p. 2. USENIX Association (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Institute, East China Normal University, Shanghai, China
Li Wang, Lei Zhang, Chengcheng Yu & Aoying Zhou

Authors

Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengcheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Wang .

Editor information

Editors and Affiliations

Pohang University of Science and Technology (POSTECH), Pohang, Korea, Republic of (South Korea)
Wook-Shin Han
National University of Singapore, Singapore, Singapore
Mong Li Lee
Udayana University, Badung, Indonesia
Agus Muliantara
Udayana University, Badung, Indonesia
Ngurah Agus Sanjaya
Christian-Albrechts-Universität zu Kiel Institut für Informatik, Kiel, Germany
Bernhard Thalheim
Fudan University, Shanghai, China
Shuigeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Zhang, L., Yu, C., Zhou, A. (2014). Optimizing Pipelined Execution for Distributed In-Memory OLAP System. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-662-43984-5_15
Published: 11 July 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics