Abstract
Among the diverse distributed query and analysis engine, Kylin have gained wide adoption since its various strengths. By using Kylin, users can interact with Hadoop data at sub-second latency. However, it still has some disadvantages. One representative disadvantage is the exponential growth of cuboids along with the growth of dimensions. In this paper, we optimize the cuboid materialization strategy of Kylin by reducing the number of cuboids based on the traditional OLAP optimization method. We optimize the strategy mainly from two aspects. Firstly, we propose Lazy-Building strategy to delay the construction of nonessential cuboid and shorten the time of cuboid initialization. Secondly, we adopt Materialized View Self-adjusting Algorithm to eliminate the cuboids which are not in use for a long period. Experimental results demonstrate the efficacy of the proposed Distributed Self-Adaption Cube Building Model. Specifically, by using our model, cube initialization speed has increased by 28.5% points and 65.8% points space are saved, comparing with the cube building model of Kylin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel ROLAP data cube construction on shared-nothing multiprocessors. Distribut. Parallel Databases 15(3), 219–236 (2004)
Impala (2017). http://impala.apache.org/. Accessed 13 Apr 2017
Deshpande, P.M., Gupta, R., Gupta, A.: Distributed iceberg cubing over ordered dimensions, March 2015. US Patent App. 14/658,542
Presto (2017). https://prestodb.io/. Accessed 13 Apr 2017
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Disc. 1(1), 29–53 (1997)
Kalisch, M., Michalak, M., Przystałka, P., Sikora, M., Wróbel, Ł.: Outlier detection and elimination in stream data – an experimental approach. In: Flores, V., et al. (eds.) IJCRS 2016. LNCS (LNAI), vol. 9920, pp. 416–426. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47160-0_38
Kylin (2017). http://kylin.apache.org/. Accessed 13 Apr 2017
Lee, S., Kim, J., Moon, Y.-S., Lee, W.: Efficient distributed parallel top-down computation of ROLAP data cube using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 168–179. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_14
Li, F., Ozsu, M.T., Chen, G., Ooi, B.C.: R-store: a scalable distributed system for supporting real-time analytics. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 40–51. IEEE (2014)
Elasticsearch (2017). https://www.elastic.co/products/elasticsearch. Accessed 13 Apr 2017
Nandi, A., Yu, C., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 183–194. IEEE (2011)
Shi, Y., Zhou, Y.: An improved apriori algorithm. In: Granular Computing (GrC), pp. 759–762. IEEE (2010)
Silva, R.R., Hirata, C.M., de Castro Lima, J.: Computing big data cubes with hybrid memory. J. Convergence Inf. Technol. 11(1), 13 (2016)
Spark SQL (2017). http://spark.apache.org/sql/. Accessed 13 Apr 2017
Wang, W., Feng, J., Lu, H., Yu, J.X.: Condensed cube: an effective approach to reducing data cube size. In: 18th International Conference on Data Engineering, Proceedings, pp. 155–165. IEEE (2002)
Xia, Y., Luo, T.T., Zhang, X., Bae, H.Y.: A parallel adaptive partial materialization method of data cube based on genetic algorithm (2016)
Yin, D., Gao, H., Zou, Z., Li, J., Cai, Z.: Approximate iceberg cube on heterogeneous dimensions. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 82–97. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32049-6_6
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. ACM SIGMOD Rec. 26, 159–170 (1997)
Acknowledgement
This work is supported by the National Key project of Scientific and Technical Supporting Programs of China (Grant No. 2015BAH07F01); Engineering Research Center of Information Networks, Ministry of Education.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Song, M., Li, M., Li, Z., E., H. (2018). A Distributed Self-adaption Cube Building Model Based on Query Log. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2017. Lecture Notes in Computer Science(), vol 10745. Springer, Cham. https://doi.org/10.1007/978-3-319-74521-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-74521-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74520-6
Online ISBN: 978-3-319-74521-3
eBook Packages: Computer ScienceComputer Science (R0)