Skip to main content

A Framework for Designing Autonomous Parallel Data Warehouses

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11945))

Abstract

Parallel data platforms are recognized as a key solution for processing analytical queries running on extremely large data warehouses (DWs). Deploying a DW on such platforms requires efficient data partitioning and allocation techniques. Most of these techniques assume a priori knowledge of workload. To deal with their evolution, reactive strategies are mainly used. The BI 2.0 requirements have put large batch and ad-hoc user queries at the center. Consequently, reactive-based solutions for deploying a DW in parallel platforms are not sufficient. Autonomous computing has emerged as a paradigm that allows digital objects managing themselves in accordance with high-level guidance by the means of proactive approaches. Being inspired by this paradigm, we propose in this paper, a proactive approach based on a query clustering model to deploying a DW over a parallel platform. The query clustering triggers partitioning and allocation processes by considering only evolved query groups. Intensive experiments were conducted to show the efficiency of our proposal.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Benkrid, S., Bellatreche, L., Cuzzocrea, A.: A global paradigm for designing parallel relational data warehouses in distributed environments. Trans. Large-Scale Data- Knowl.-Cent. Syst. 15, 64–101 (2014)

    Google Scholar 

  2. Boukorca, A., Bellatreche, L., Benkrid, S.: HYPAD: hyper-graph-driven approach for parallel data warehouse design. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015, Part IV. LNCS, vol. 9531, pp. 770–783. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27140-8_53

    Chapter  Google Scholar 

  3. Cámara, J., et al.: Self-aware computing systems: related concepts and research areas. In: Kounev, S., Kephart, J., Milenkoski, A., Zhu, X. (eds.) Self-Aware Computing Systems, pp. 17–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47474-8_2

    Chapter  Google Scholar 

  4. Du, J., Miller, R.J., Glavic, B., Tan, W.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)

    Google Scholar 

  5. Durand, G.C., et al.: GridFormation: towards self-driven online data partitioning using reinforcement learning. In: First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1–7 (2018)

    Google Scholar 

  6. Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: VLDB, pp. 179–190 (2002)

    Google Scholar 

  7. Hintikka, J.: ‘Knowing that one knows’ reviewed. Synthese 21(2), 141–162 (1970)

    Article  Google Scholar 

  8. Horn, P.: Autonomic computing: IBM\(\backslash \)’s perspective on the state of information technology. IBM (2001)

    Google Scholar 

  9. Jindal, A., Karanasos, K., Rao, S., Patel, H.: Selecting subexpressions to materialize at datacenter scale. Proc. VLDB Endow. 11(7), 800–812 (2018)

    Article  Google Scholar 

  10. Ma, L., Van Aken, D., Hefny, A., Mezerhane, G., Pavlo, A., Gordon, G.J.: Query-based workload forecasting for self-driving database management systems. In: ACM SIGMOD, pp. 631–645 (2018)

    Google Scholar 

  11. Nehme, R. Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1137–1148. ACM, New York (2011)

    Google Scholar 

  12. Goldschmidt, O., Nehme, D., Yu, G.: Note: on the set-union knapsack problem. Nav. Res. Logist. (NRL) 41(6), 833–842 (1994)

    Article  Google Scholar 

  13. Roy, P., Sudarshan, S.: Multi-query optimization. In: Encyclopedia of Database Systems, 2nd edn (2018)

    Google Scholar 

  14. Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Soumia Benkrid or Ladjel Bellatreche .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Benkrid, S., Bellatreche, L. (2020). A Framework for Designing Autonomous Parallel Data Warehouses. In: Wen, S., Zomaya, A., Yang, L.T. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38961-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38960-4

  • Online ISBN: 978-3-030-38961-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics