Abstract
In the coming big data era, the demand for data analysis capability in real applications is growing at amazing pace. The memory’s increasing capacity and decreasing price make it possible and attractive for the distributed OLAP system to load all the data into memory and thus significantly improve the data processing performance. In this paper, we model the performance of pipelined execution in distributed in-memory OLAP system and figure out that the data communication among the computation nodes, which is achieved by data exchange operator, is the performance bottleneck. Consequently, we explore the pipelined data exchange in depth and give a novel solution that is efficient, scalable, and skew-resilient. Experimental results show the effectiveness of our proposals by comparing with state-of-art techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Other queries are also evaluated in our experiment and result in similar results. Due to space limitation, these results are omitted.
References
Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 671–682. ACM (2006)
Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064–1075 (2012)
Barlow, M.: Real-Time Big Data Analytics: Emerging Architecture. O’Reilly Media Inc., Sebastopol (2013)
Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: hyper-pipelining query execution. CIDR 5, 225–237 (2005)
Borthakur, D.: The hadoop distributed file system: architecture and design (2007)
Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 339–350. VLDB Endowment (2007)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: NSDI, vol. 10, pp. 20 (2010)
Graefe, G.: Volcano-an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Kumar, V.S., Tucek, J., Wylie, J.J., Krevat, E., Ganger, G.R.: Application-level flow scheduling for efficient collective data transfers (2012)
Plattner, H.: A common database approach for oltp and olap using an in-memory column database. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 1–2. ACM (2009)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Proceedings of the Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the NSDI, p. 2. USENIX Association (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Zhang, L., Yu, C., Zhou, A. (2014). Optimizing Pipelined Execution for Distributed In-Memory OLAP System. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-662-43984-5_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)