Abstract
Parallel data platforms are recognized as a key solution for processing analytical queries running on extremely large data warehouses (DWs). Deploying a DW on such platforms requires efficient data partitioning and allocation techniques. Most of these techniques assume a priori knowledge of workload. To deal with their evolution, reactive strategies are mainly used. The BI 2.0 requirements have put large batch and ad-hoc user queries at the center. Consequently, reactive-based solutions for deploying a DW in parallel platforms are not sufficient. Autonomous computing has emerged as a paradigm that allows digital objects managing themselves in accordance with high-level guidance by the means of proactive approaches. Being inspired by this paradigm, we propose in this paper, a proactive approach based on a query clustering model to deploying a DW over a parallel platform. The query clustering triggers partitioning and allocation processes by considering only evolved query groups. Intensive experiments were conducted to show the efficiency of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benkrid, S., Bellatreche, L., Cuzzocrea, A.: A global paradigm for designing parallel relational data warehouses in distributed environments. Trans. Large-Scale Data- Knowl.-Cent. Syst. 15, 64–101 (2014)
Boukorca, A., Bellatreche, L., Benkrid, S.: HYPAD: hyper-graph-driven approach for parallel data warehouse design. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015, Part IV. LNCS, vol. 9531, pp. 770–783. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27140-8_53
Cámara, J., et al.: Self-aware computing systems: related concepts and research areas. In: Kounev, S., Kephart, J., Milenkoski, A., Zhu, X. (eds.) Self-Aware Computing Systems, pp. 17–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47474-8_2
Du, J., Miller, R.J., Glavic, B., Tan, W.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)
Durand, G.C., et al.: GridFormation: towards self-driven online data partitioning using reinforcement learning. In: First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1–7 (2018)
Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: VLDB, pp. 179–190 (2002)
Hintikka, J.: ‘Knowing that one knows’ reviewed. Synthese 21(2), 141–162 (1970)
Horn, P.: Autonomic computing: IBM\(\backslash \)’s perspective on the state of information technology. IBM (2001)
Jindal, A., Karanasos, K., Rao, S., Patel, H.: Selecting subexpressions to materialize at datacenter scale. Proc. VLDB Endow. 11(7), 800–812 (2018)
Ma, L., Van Aken, D., Hefny, A., Mezerhane, G., Pavlo, A., Gordon, G.J.: Query-based workload forecasting for self-driving database management systems. In: ACM SIGMOD, pp. 631–645 (2018)
Nehme, R. Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1137–1148. ACM, New York (2011)
Goldschmidt, O., Nehme, D., Yu, G.: Note: on the set-union knapsack problem. Nav. Res. Logist. (NRL) 41(6), 833–842 (1994)
Roy, P., Sudarshan, S.: Multi-query optimization. In: Encyclopedia of Database Systems, 2nd edn (2018)
Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Benkrid, S., Bellatreche, L. (2020). A Framework for Designing Autonomous Parallel Data Warehouses. In: Wen, S., Zomaya, A., Yang, L.T. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-38961-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38960-4
Online ISBN: 978-3-030-38961-1
eBook Packages: Computer ScienceComputer Science (R0)