Abstract
The process of designing a parallel data warehouse has two main steps: (1) fragmentation and (2) allocation of so-generated fragments at various nodes. Usually, we split the data warehouse horizontally, allocate fragments over nodes, and finally balance the load over the nodes of the parallel machine. The main drawback of such design approach is that the high communication cost. Therefore, Data Replication (DR) has become a requirement for availability on the one hand but also for minimizing the communication cost on the other hand. In this paper, we present a redundant allocation algorithm for designing shared-nothing parallel relational data warehouses, which is based on the well-known fuzzy k-means clustering algorithm.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, D., Das, S., El Abbadi, A.: Data Management in the Cloud: Challenges and Opportunities. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2012)
Ahmad, I., Karlapalem, K., Ghafoor, R.A.: Evolutionary algorithms for allocating data in distributed database systems. In: Distributed Database Systems, Distributed and Parallel Databases, pp. 5–32 (2002)
Akal, F., Böhm, K., Schek, H.-J.: OLAP query evaluation in a database cluster: A performance study on intra-query parallelism. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 218–231. Springer, Heidelberg (2002)
Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)
Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009)
Bellatreche, L., Benkrid, S., Crolotte, A., Cuzzocrea, A., Ghazal, A.: The f&a methodology and its experimental validation on a real-life parallel processing database system. In: CISIS 2012, pp. 114–121 (2012)
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\mathcal{F}\)&\(\mathcal{A}\): A methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Heidelberg (2010)
Bergsten, B., Couprie, M., Valduriez, P.: Overview of parallel architectures for databases. Comput. J. 36(8), 734–740 (1993)
Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Computers and Geo-sciences 10(2-3), 191–203 (1984)
Ciciani, B., Dias, D.M., Yu, P.S.: Analysis of replication in distributed database systems. IEEE Trans. on Knowl. and Data Eng., 247–261 (1990)
Cuzzocrea, A.: Theoretical and practical aspects of warehousing, querying and mining sensor and streaming data. Journal of Computer and System Science 79(3), 309–311 (2013)
DeWitt, D., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse, http://db.lcs.mit.edu/madden/high_perf.pdf
Hsiao, H.I., Dewitt, D.J.: Chained declustering: A new availability strategy for multiprocssor database machines. In: ICDE 1990, pp. 456–465 (1990)
Coffman Jr., E.G., Leung, Joseph, Y.-T., Ting, D.W.: Bin packing: Maximizing the number of pieces packed 9, 263–271 (1978)
Lima, A.A.B., Mattoso, M., Valduriez, P.: Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster. In: Lifschitz, S. (ed.) SBBD 2004, Brasilia, Brésil, pp. 92–105 (2004)
Lima, A.B., Furtado, C., Valduriez, P., Mattoso, M.: Parallel olap query processing in database clusters with data replication. distributed and parallel databases. Distributed and Parallel Database Journal 25(1-2), 97–123 (2009)
Loukopoulos, T., Ahmad, I.: Static and adaptive distributed data replication using genetic algorithms. Journal of Parallel and Distributed Computing 64(11), 1270–1285 (2004)
Menon, S.: Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems 16(7), 577–585 (2005)
Nehme, R.V., Bruno, N.: Automated partitioning design in parallel database systems. In: ACM SIGMOD 2011, pp. 1137–1148 (2011)
Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: ACM SIGMOD 2012, pp. 61–72. ACM, New York (2012)
Rao, J., Zhang, C., Lohman, G., Megiddo, N.: Automating physical database design in a parallel database. In: ACM SIGMOD 2002, pp. 558–569 (June 2002)
Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: VLDB 2000, pp. 273–284 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Benkrid, S., Bellatreche, L., Cuzzocrea, A. (2014). Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-01863-8_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01862-1
Online ISBN: 978-3-319-01863-8
eBook Packages: EngineeringEngineering (R0)