A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems

Zhao, Hongwei; Ye, Xiaojun

doi:10.1007/978-3-319-04936-6_7

Hongwei Zhao¹⁸ &
Xiaojun Ye¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8391))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1594 Accesses
9 Citations

Abstract

While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Evelson, B.: It’s the dawning of the age of BI DBMS. Technical report (2011), http://www.forrester.com
Cuzzocrea, A., Il-Yeol, S., Karen, C.D.: Analytics over large-scale multidimensional data: the big data revolution. In: Proceedings of the DOLAP, pp. 101–103. ACM (2011)
Google Scholar
Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1), 330–339 (2010)
Google Scholar
Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: Proceedings of the10th USENIX Symposium on OSDI, pp. 251–264 (2012)
Google Scholar
Xin, R., et al.: Shark: SQL and rich analytics at scale.arXiv preprint arXiv:1211.6176 (2012)
Google Scholar
Chen, Z., Carlos, O.: Efficient OLAP with UDFs. In: Proceedings of the DOLAP, pp. 41–48. ACM (2008)
Google Scholar
Turcu, A., Binoy, R.: Hyflow2: A high performance distributed transactional memory framework in scala (2012), http://hyflow.org/hyflow/chrome/site/pub/hyflow2-tech.pdf
Ghazal, A., Hu, M., Rabl, T., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark forbig data analytics. In: Proceedings of the SIGMOD (2013)
Google Scholar
Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceeding of VLDB, pp. 1138–1149. ACM (2007)
Google Scholar
Cheung, D., Zhou, B., Kao, B., Lu, H., Lam, T., Ting, H.: Requirement-based data cube schema design. In: Proceedings of the CIKM, pp. 162–169. ACM (1999)
Google Scholar
Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: The Proceeding of DOLAP, pp. 9–15. ACM (2001)
Google Scholar
Dehne, F., et al.: A Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures
Google Scholar
Ciferri, C., Ciferri, R., Gómez, L.I., Schneider, M., Vaisman, A.A., Zimanyi, E.: Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes. International Journal of Data Warehousing and Mining (2012)
Google Scholar
Goil, S., Alok, C.: High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1(4), 391–417 (1997)
Article Google Scholar
Romero, O., Alberto, A.: Multidimensional Design by Examples. Data Warehousing and Knowledge Discovery, pp. 85–94. Springer, Heidelberg (2006)
Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)
Google Scholar
Li, J., Rotem, D., Srivastava, J.: Aggregation Algorithms for Very Large Compressed Data Warehouses. In: Proceedings of the VLDB, pp. 651–662. ACM (1999)
Google Scholar
Taylor, R.C.: An Overview of the Hadoop/MapReduce/HBaseFramework and Its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)
Google Scholar
Dean, J., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Van Renesse, R., Dumitriu, D., Gough, V., et al.: Efficient Reconciliation and Flow Control for Anti-entropy Protocols. In: Proceedings of the LADIS. ACM (2008)
Google Scholar
Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
Chapter Google Scholar
d’Orazio, L., Bimonte, S.: Multidimensional arrays for warehousing data on clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds.) Globe 2010. LNCS, vol. 6265, pp. 26–37. Springer, Heidelberg (2010)
Chapter Google Scholar
Dutta, H., Kamil, A., Pooleery, M., et al.: Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase. In: Grid and Cloud Database Management, pp. 331–347. Springer, Heidelberg (2011)
Chapter Google Scholar
Wu, L., Sumbaly, R., Riccomini, C., et al.: Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12), 1874–1877 (2012)
Google Scholar
Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z.: LinearDB: A relational approach to make data warehouse scale like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R., et al. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 306–320. Springer, Heidelberg (2011)
Chapter Google Scholar
Nishimura, S., Das, S., Agrawal, D., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. In: Distributed and Parallel Databases, pp. 1–31 (2012)
Google Scholar
Zhizhin, M., Medvedev, D., Mishin, D., et al.: Transparent Data Cube for Spatiotemporal Data Mining and Visualization. In: Grid and Cloud Database Management, pp. 307–330. Springer, Heidelberg (2011)
Google Scholar
Lehene, C.: Low Latency “OLAP” with Hbase, HBaseCon (2012), http://www.slideshare.net/Hadoop_Summit/low-latancy-olap-with-hadoop-13386744

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, 100084, China
Hongwei Zhao & Xiaojun Ye

Authors

Hongwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Ye
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Data Center Business Group, Cisco Systems, Inc., 275 East Tasman Drive, 95134, San Jose, CA, USA
Raghunath Nambiar
Server Technologies, Oracle Corporation, 500 Oracle Parkway, 94065, Redwood Shores, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, H., Ye, X. (2014). A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. https://doi.org/10.1007/978-3-319-04936-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-04936-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04935-9
Online ISBN: 978-3-319-04936-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics