A Framework for Designing Autonomous Parallel Data Warehouses

Benkrid, Soumia; Bellatreche, Ladjel

doi:10.1007/978-3-030-38961-1_9

Soumia Benkrid¹⁰ &
Ladjel Bellatreche¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11945))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1958 Accesses

Abstract

Parallel data platforms are recognized as a key solution for processing analytical queries running on extremely large data warehouses (DWs). Deploying a DW on such platforms requires efficient data partitioning and allocation techniques. Most of these techniques assume a priori knowledge of workload. To deal with their evolution, reactive strategies are mainly used. The BI 2.0 requirements have put large batch and ad-hoc user queries at the center. Consequently, reactive-based solutions for deploying a DW in parallel platforms are not sufficient. Autonomous computing has emerged as a paradigm that allows digital objects managing themselves in accordance with high-level guidance by the means of proactive approaches. Being inspired by this paradigm, we propose in this paper, a proactive approach based on a query clustering model to deploying a DW over a parallel platform. The query clustering triggers partitioning and allocation processes by considering only evolved query groups. Intensive experiments were conducted to show the efficiency of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HYPAD: Hyper-Graph-Driven Approach for Parallel Data Warehouse Design

Distributed Data Warehouse Resource Monitoring

Big Data Query Engines

References

Benkrid, S., Bellatreche, L., Cuzzocrea, A.: A global paradigm for designing parallel relational data warehouses in distributed environments. Trans. Large-Scale Data- Knowl.-Cent. Syst. 15, 64–101 (2014)
Google Scholar
Boukorca, A., Bellatreche, L., Benkrid, S.: HYPAD: hyper-graph-driven approach for parallel data warehouse design. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015, Part IV. LNCS, vol. 9531, pp. 770–783. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27140-8_53
Chapter Google Scholar
Cámara, J., et al.: Self-aware computing systems: related concepts and research areas. In: Kounev, S., Kephart, J., Milenkoski, A., Zhu, X. (eds.) Self-Aware Computing Systems, pp. 17–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47474-8_2
Chapter Google Scholar
Du, J., Miller, R.J., Glavic, B., Tan, W.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)
Google Scholar
Durand, G.C., et al.: GridFormation: towards self-driven online data partitioning using reinforcement learning. In: First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1–7 (2018)
Google Scholar
Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: VLDB, pp. 179–190 (2002)
Google Scholar
Hintikka, J.: ‘Knowing that one knows’ reviewed. Synthese 21(2), 141–162 (1970)
Article Google Scholar
Horn, P.: Autonomic computing: IBM$\backslash $’s perspective on the state of information technology. IBM (2001)
Google Scholar
Jindal, A., Karanasos, K., Rao, S., Patel, H.: Selecting subexpressions to materialize at datacenter scale. Proc. VLDB Endow. 11(7), 800–812 (2018)
Article Google Scholar
Ma, L., Van Aken, D., Hefny, A., Mezerhane, G., Pavlo, A., Gordon, G.J.: Query-based workload forecasting for self-driving database management systems. In: ACM SIGMOD, pp. 631–645 (2018)
Google Scholar
Nehme, R. Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1137–1148. ACM, New York (2011)
Google Scholar
Goldschmidt, O., Nehme, D., Yu, G.: Note: on the set-union knapsack problem. Nav. Res. Logist. (NRL) 41(6), 833–842 (1994)
Article Google Scholar
Roy, P., Sudarshan, S.: Multi-query optimization. In: Encyclopedia of Database Systems, 2nd edn (2018)
Google Scholar
Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ecole nationale Supérieure d’Informatique (ESI), Algiers, Algeria
Soumia Benkrid
LIAS/ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche

Authors

Soumia Benkrid
View author publications
You can also search for this author in PubMed Google Scholar
Ladjel Bellatreche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Soumia Benkrid or Ladjel Bellatreche .

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Melbourne, VIC, Australia
Sheng Wen
School of Computer Science, The University of Sydney, Camperdown, NSW, Australia
Albert Zomaya
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benkrid, S., Bellatreche, L. (2020). A Framework for Designing Autonomous Parallel Data Warehouses. In: Wen, S., Zomaya, A., Yang, L.T. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-38961-1_9
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38960-4
Online ISBN: 978-3-030-38961-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics