Skip to main content

A Distributed Self-adaption Cube Building Model Based on Query Log

  • Conference paper
  • First Online:
Human Centered Computing (HCC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10745))

Included in the following conference series:

Abstract

Among the diverse distributed query and analysis engine, Kylin have gained wide adoption since its various strengths. By using Kylin, users can interact with Hadoop data at sub-second latency. However, it still has some disadvantages. One representative disadvantage is the exponential growth of cuboids along with the growth of dimensions. In this paper, we optimize the cuboid materialization strategy of Kylin by reducing the number of cuboids based on the traditional OLAP optimization method. We optimize the strategy mainly from two aspects. Firstly, we propose Lazy-Building strategy to delay the construction of nonessential cuboid and shorten the time of cuboid initialization. Secondly, we adopt Materialized View Self-adjusting Algorithm to eliminate the cuboids which are not in use for a long period. Experimental results demonstrate the efficacy of the proposed Distributed Self-Adaption Cube Building Model. Specifically, by using our model, cube initialization speed has increased by 28.5% points and 65.8% points space are saved, comparing with the cube building model of Kylin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel ROLAP data cube construction on shared-nothing multiprocessors. Distribut. Parallel Databases 15(3), 219–236 (2004)

    Article  Google Scholar 

  2. Impala (2017). http://impala.apache.org/. Accessed 13 Apr 2017

  3. Deshpande, P.M., Gupta, R., Gupta, A.: Distributed iceberg cubing over ordered dimensions, March 2015. US Patent App. 14/658,542

    Google Scholar 

  4. Presto (2017). https://prestodb.io/. Accessed 13 Apr 2017

  5. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Disc. 1(1), 29–53 (1997)

    Article  Google Scholar 

  6. Kalisch, M., Michalak, M., Przystałka, P., Sikora, M., Wróbel, Ł.: Outlier detection and elimination in stream data – an experimental approach. In: Flores, V., et al. (eds.) IJCRS 2016. LNCS (LNAI), vol. 9920, pp. 416–426. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47160-0_38

    Chapter  Google Scholar 

  7. Kylin (2017). http://kylin.apache.org/. Accessed 13 Apr 2017

  8. Lee, S., Kim, J., Moon, Y.-S., Lee, W.: Efficient distributed parallel top-down computation of ROLAP data cube using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 168–179. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_14

    Chapter  Google Scholar 

  9. Li, F., Ozsu, M.T., Chen, G., Ooi, B.C.: R-store: a scalable distributed system for supporting real-time analytics. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 40–51. IEEE (2014)

    Google Scholar 

  10. Elasticsearch (2017). https://www.elastic.co/products/elasticsearch. Accessed 13 Apr 2017

  11. Nandi, A., Yu, C., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 183–194. IEEE (2011)

    Google Scholar 

  12. Shi, Y., Zhou, Y.: An improved apriori algorithm. In: Granular Computing (GrC), pp. 759–762. IEEE (2010)

    Google Scholar 

  13. Silva, R.R., Hirata, C.M., de Castro Lima, J.: Computing big data cubes with hybrid memory. J. Convergence Inf. Technol. 11(1), 13 (2016)

    Google Scholar 

  14. Spark SQL (2017). http://spark.apache.org/sql/. Accessed 13 Apr 2017

  15. Wang, W., Feng, J., Lu, H., Yu, J.X.: Condensed cube: an effective approach to reducing data cube size. In: 18th International Conference on Data Engineering, Proceedings, pp. 155–165. IEEE (2002)

    Google Scholar 

  16. Xia, Y., Luo, T.T., Zhang, X., Bae, H.Y.: A parallel adaptive partial materialization method of data cube based on genetic algorithm (2016)

    Google Scholar 

  17. Yin, D., Gao, H., Zou, Z., Li, J., Cai, Z.: Approximate iceberg cube on heterogeneous dimensions. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 82–97. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32049-6_6

    Chapter  Google Scholar 

  18. Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. ACM SIGMOD Rec. 26, 159–170 (1997)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Key project of Scientific and Technical Supporting Programs of China (Grant No. 2015BAH07F01); Engineering Research Center of Information Networks, Ministry of Education.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Meina Song , Mingkun Li , Zhuohuan Li or Haihong E. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, M., Li, M., Li, Z., E., H. (2018). A Distributed Self-adaption Cube Building Model Based on Query Log. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2017. Lecture Notes in Computer Science(), vol 10745. Springer, Cham. https://doi.org/10.1007/978-3-319-74521-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74521-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74520-6

  • Online ISBN: 978-3-319-74521-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics