Abstract
While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Evelson, B.: It’s the dawning of the age of BI DBMS. Technical report (2011), http://www.forrester.com
Cuzzocrea, A., Il-Yeol, S., Karen, C.D.: Analytics over large-scale multidimensional data: the big data revolution. In: Proceedings of the DOLAP, pp. 101–103. ACM (2011)
Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1), 330–339 (2010)
Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: Proceedings of the10th USENIX Symposium on OSDI, pp. 251–264 (2012)
Xin, R., et al.: Shark: SQL and rich analytics at scale.arXiv preprint arXiv:1211.6176 (2012)
Chen, Z., Carlos, O.: Efficient OLAP with UDFs. In: Proceedings of the DOLAP, pp. 41–48. ACM (2008)
Turcu, A., Binoy, R.: Hyflow2: A high performance distributed transactional memory framework in scala (2012), http://hyflow.org/hyflow/chrome/site/pub/hyflow2-tech.pdf
Ghazal, A., Hu, M., Rabl, T., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark forbig data analytics. In: Proceedings of the SIGMOD (2013)
Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceeding of VLDB, pp. 1138–1149. ACM (2007)
Cheung, D., Zhou, B., Kao, B., Lu, H., Lam, T., Ting, H.: Requirement-based data cube schema design. In: Proceedings of the CIKM, pp. 162–169. ACM (1999)
Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: The Proceeding of DOLAP, pp. 9–15. ACM (2001)
Dehne, F., et al.: A Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures
Ciferri, C., Ciferri, R., Gómez, L.I., Schneider, M., Vaisman, A.A., Zimanyi, E.: Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes. International Journal of Data Warehousing and Mining (2012)
Goil, S., Alok, C.: High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1(4), 391–417 (1997)
Romero, O., Alberto, A.: Multidimensional Design by Examples. Data Warehousing and Knowledge Discovery, pp. 85–94. Springer, Heidelberg (2006)
Zaharia, M., et al.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)
Li, J., Rotem, D., Srivastava, J.: Aggregation Algorithms for Very Large Compressed Data Warehouses. In: Proceedings of the VLDB, pp. 651–662. ACM (1999)
Taylor, R.C.: An Overview of the Hadoop/MapReduce/HBaseFramework and Its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)
Dean, J., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)
Van Renesse, R., Dumitriu, D., Gough, V., et al.: Efficient Reconciliation and Flow Control for Anti-entropy Protocols. In: Proceedings of the LADIS. ACM (2008)
Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
d’Orazio, L., Bimonte, S.: Multidimensional arrays for warehousing data on clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds.) Globe 2010. LNCS, vol. 6265, pp. 26–37. Springer, Heidelberg (2010)
Dutta, H., Kamil, A., Pooleery, M., et al.: Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase. In: Grid and Cloud Database Management, pp. 331–347. Springer, Heidelberg (2011)
Wu, L., Sumbaly, R., Riccomini, C., et al.: Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12), 1874–1877 (2012)
Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z.: LinearDB: A relational approach to make data warehouse scale like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R., et al. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 306–320. Springer, Heidelberg (2011)
Nishimura, S., Das, S., Agrawal, D., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. In: Distributed and Parallel Databases, pp. 1–31 (2012)
Zhizhin, M., Medvedev, D., Mishin, D., et al.: Transparent Data Cube for Spatiotemporal Data Mining and Visualization. In: Grid and Cloud Database Management, pp. 307–330. Springer, Heidelberg (2011)
Lehene, C.: Low Latency “OLAP” with Hbase, HBaseCon (2012), http://www.slideshare.net/Hadoop_Summit/low-latancy-olap-with-hadoop-13386744
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, H., Ye, X. (2014). A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. https://doi.org/10.1007/978-3-319-04936-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-04936-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04935-9
Online ISBN: 978-3-319-04936-6
eBook Packages: Computer ScienceComputer Science (R0)