Abstract
The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agosta, L.: Data Warehousing Lessons Learned: SMP or MPP for Data Warehousing. DM Review Magazine (2002)
Almeida, R., Vieira, M.: Selected TPC-DS queries and execution times, http://eden.dei.uc.pt/~mvieira/
Bernardino, J., Madeira, H.: A New Technique to Speedup Queries in Data Warehousing. In: Symp. on Advances in DB and Information Systems, Prague (2001)
Bernardino, J., Madeira, H.: Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses. In: International Symp. on Database Engineering and Applications, IDEAS 2001, Grenoble, France (2001)
Jenkins, B.: “Hash Functions”, “Algorithm Alley”. Dr. Dobb’s Journal (September 1997)
Critical Software SA, “DWS”, www.criticalsoftware.com
DATAllegro, “DATAllegro v3”, www.datallegro.com
ExtenDB, ExtenDB Parallel Server for Data Warehousing, http://www.extendb.com
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edn. J. Wiley & Sons, Inc., Chichester (2002)
Netezza: The Netezza Performance Server DW Appliance, http://www.netezza.com
Sun Microsystems, Data Warehousing Performance with SMP and MPP Architectures, White Paper (1998)
Transaction Processing Performance Council, TPC BenchmarkTM DS (Decision Support) Standard Specification, Draft Version 32 (2007), http://www.tpc.org/tpcds
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Almeida, R., Vieira, J., Vieira, M., Madeira, H., Bernardino, J. (2008). Efficient Data Distribution for DWS. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)