A Distributed Self-adaption Cube Building Model Based on Query Log

Song, Meina; Li, Mingkun; Li, Zhuohuan; E., Haihong

doi:10.1007/978-3-319-74521-3_41

Meina Song¹⁵,
Mingkun Li¹⁵,
Zhuohuan Li¹⁵ &
…
Haihong E.¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10745))

Included in the following conference series:

International Conference on Human Centered Computing

1671 Accesses
2 Citations

Abstract

Among the diverse distributed query and analysis engine, Kylin have gained wide adoption since its various strengths. By using Kylin, users can interact with Hadoop data at sub-second latency. However, it still has some disadvantages. One representative disadvantage is the exponential growth of cuboids along with the growth of dimensions. In this paper, we optimize the cuboid materialization strategy of Kylin by reducing the number of cuboids based on the traditional OLAP optimization method. We optimize the strategy mainly from two aspects. Firstly, we propose Lazy-Building strategy to delay the construction of nonessential cuboid and shorten the time of cuboid initialization. Secondly, we adopt Materialized View Self-adjusting Algorithm to eliminate the cuboids which are not in use for a long period. Experimental results demonstrate the efficacy of the proposed Distributed Self-Adaption Cube Building Model. Specifically, by using our model, cube initialization speed has increased by 28.5% points and 65.8% points space are saved, comparing with the cube building model of Kylin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel ROLAP data cube construction on shared-nothing multiprocessors. Distribut. Parallel Databases 15(3), 219–236 (2004)
Article Google Scholar
Impala (2017). http://impala.apache.org/. Accessed 13 Apr 2017
Deshpande, P.M., Gupta, R., Gupta, A.: Distributed iceberg cubing over ordered dimensions, March 2015. US Patent App. 14/658,542
Google Scholar
Presto (2017). https://prestodb.io/. Accessed 13 Apr 2017
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Disc. 1(1), 29–53 (1997)
Article Google Scholar
Kalisch, M., Michalak, M., Przystałka, P., Sikora, M., Wróbel, Ł.: Outlier detection and elimination in stream data – an experimental approach. In: Flores, V., et al. (eds.) IJCRS 2016. LNCS (LNAI), vol. 9920, pp. 416–426. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47160-0_38
Chapter Google Scholar
Kylin (2017). http://kylin.apache.org/. Accessed 13 Apr 2017
Lee, S., Kim, J., Moon, Y.-S., Lee, W.: Efficient distributed parallel top-down computation of ROLAP data cube using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 168–179. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_14
Chapter Google Scholar
Li, F., Ozsu, M.T., Chen, G., Ooi, B.C.: R-store: a scalable distributed system for supporting real-time analytics. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 40–51. IEEE (2014)
Google Scholar
Elasticsearch (2017). https://www.elastic.co/products/elasticsearch. Accessed 13 Apr 2017
Nandi, A., Yu, C., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 183–194. IEEE (2011)
Google Scholar
Shi, Y., Zhou, Y.: An improved apriori algorithm. In: Granular Computing (GrC), pp. 759–762. IEEE (2010)
Google Scholar
Silva, R.R., Hirata, C.M., de Castro Lima, J.: Computing big data cubes with hybrid memory. J. Convergence Inf. Technol. 11(1), 13 (2016)
Google Scholar
Spark SQL (2017). http://spark.apache.org/sql/. Accessed 13 Apr 2017
Wang, W., Feng, J., Lu, H., Yu, J.X.: Condensed cube: an effective approach to reducing data cube size. In: 18th International Conference on Data Engineering, Proceedings, pp. 155–165. IEEE (2002)
Google Scholar
Xia, Y., Luo, T.T., Zhang, X., Bae, H.Y.: A parallel adaptive partial materialization method of data cube based on genetic algorithm (2016)
Google Scholar
Yin, D., Gao, H., Zou, Z., Li, J., Cai, Z.: Approximate iceberg cube on heterogeneous dimensions. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 82–97. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32049-6_6
Chapter Google Scholar
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. ACM SIGMOD Rec. 26, 159–170 (1997)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Key project of Scientific and Technical Supporting Programs of China (Grant No. 2015BAH07F01); Engineering Research Center of Information Networks, Ministry of Education.

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Meina Song, Mingkun Li, Zhuohuan Li & Haihong E.

Authors

Meina Song
View author publications
You can also search for this author in PubMed Google Scholar
Mingkun Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhuohuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Haihong E.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Meina Song , Mingkun Li , Zhuohuan Li or Haihong E. .

Editor information

Editors and Affiliations

Wuhan University of Technology, Wuhan, China
Qiaohong Zu
Fujitsu Laboratories of Europe Ltd., Hayes, United Kingdom
Bo Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, M., Li, M., Li, Z., E., H. (2018). A Distributed Self-adaption Cube Building Model Based on Query Log. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2017. Lecture Notes in Computer Science(), vol 10745. Springer, Cham. https://doi.org/10.1007/978-3-319-74521-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-74521-3_41
Published: 23 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74520-6
Online ISBN: 978-3-319-74521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics