skip to main content
10.1145/2076623.2076628acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

A predictable storage model for scalable parallel DW

Published: 21 September 2011 Publication History

Abstract

Star schema model, has been widely used as the facto DW storage organization on RDBMS. Business measures are stored in a central fact table along with a set of foreign keys referencing dimension tables. While this storage organization offers a good trade-off between storage size and performance for a single node, it doesn't scale in a predictable manner in shared-nothing parallel architectures. Although fact tables can be linearly partitioned among nodes, the same doesn't apply to dimensions, which unbalances (increases) the dimensions/fact_table size ratio, and consequently introduces limits to the number of parallel nodes. In this paper we propose and evaluate a parallel DW storage model, that overcomes these limitations and deliver optimal speed-up and scale-up capabilities with top efficiency. We use the TPC-H benchmark to evaluate the scalability and efficiency of the proposed model.

References

[1]
A. Pavlo et al., "A comparison of approaches to large-scale data analysis," Proceedings of the 35th SIGMOD international conference on Management of data, p. 165--178, 2009.
[2]
J. M. Patel, M. J. Carey, and M. K. Vernon, "Accurate modeling of the hybrid hash join algorithm," in ACM SIGMETRICS Performance Evaluation Review, NY, USA, 1994.
[3]
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood, "Implementation techniques for main memory database systems," in ACM SIGMOD Record, New York, NY, USA, 1984, p. 1--8.
[4]
E. P. Harris and K. Ramamohanarao, "Join algorithm costs revisited," The VLDB Journal --- The International Journal on Very Large Data Bases, vol. 5, p. 064--084, 1996.
[5]
T. Johnson, "Performance Measurements of Compressed Bitmap Indices," Proceedings of the 25th International Conference on Very Large Data Bases, p. 278--289, 1999.
[6]
J. Zhou, P.-A. Larson, J. Goldstein, and L. Ding, "Dynamic Materialized Views," in Data Engineering, International Conference on, Los Alamitos, CA, USA, 2007, pp. 526--535.
[7]
J. P. Costa and P. Furtado, "Time-Stratified Sampling for Approximate Answers to Aggregate Queries," in Database Systems for Advanced Applications, International Conference on, Los Alamitos, CA, USA, 2003, 215.
[8]
M. Stonebraker et al., "C-store: a column-oriented DBMS," Proceedings of the 31st international conference on Very large data bases, p. 553--564, 2005.
[9]
Y. Zhang, W. Hu, and S. Wang, "MOSS-DB: a hardware-aware OLAP database," Proc. 11th international conference on Web-age information management, p. 582--594, 2010.
[10]
P. Yma, "A Framework for Systematic Database Denormalization," Global Journal of Computer Science and Technology, vol. 9, no. 4, Aug. 2009.
[11]
G. L. Sanders, "Denormalization Effects on Performance of RDBMS," Proceedings of the 34th Hawaii International Conference on System Sciences, 2001.
[12]
M. Zaker, S. Phon-Amnuaisuk, and S.-C. Haw, "Optimizing the data warehouse design by hierarchical denormalizing," Proc. 8th conference on Applied computer scince, 2008.
[13]
P. Furtado, "Large Relations in Node-Partitioned Data Warehouses," in Database Systems for Advanced Applications, vol. 3453, Springer Berlin/Heidelberg, 2005, p. 987.
[14]
J. P. Costa, J. Cecílio, P. Martins, and P. Furtado, "ONE: A predictable and Scalable DW model," presented at the 13th International Conference on Data Warehousing and Knowledge Discovery - DaWaK 2011, Toulouse, France, 2011.
[15]
P. O'Neil, E. O'Neil, X. Chen, and S. Revilak, "The Star Schema Benchmark and Augmented Fact Table Indexing," in Performance Evaluation and Benchmarking, vol. 5895, Springer Berlin/Heidelberg, 2009, pp. 237--252.
[16]
R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, and B. Becker, The Data Warehouse Lifecycle Toolkit, 2nd ed. Wiley Publishing, 2008.
[17]
"PostgreSQL." Available: http://www.postgresql.org/.
[18]
"TPC-H Benchmark.". Available: http://www.tpc.org/tpch/.

Cited By

View all
  • (2015)Data Warehouse Processing Scale-Up for Massive Concurrent Queries with SPINTransactions on Large-Scale Data- and Knowledge-Centered Systems XVII10.1007/978-3-662-46335-2_1(1-23)Online publication date: 30-Jan-2015
  • (2013)CloudyProceedings of the 17th International Database Engineering & Applications Symposium10.1145/2513591.2513659(5-13)Online publication date: 9-Oct-2013
  • (2013)SPINProceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 805710.1007/978-3-642-40131-2_6(60-71)Online publication date: 26-Aug-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '11: Proceedings of the 15th Symposium on International Database Engineering & Applications
September 2011
274 pages
ISBN:9781450306270
DOI:10.1145/2076623
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DW
  2. parallel DW
  3. shared nothing

Qualifiers

  • Research-article

Conference

IDEAS '11

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Data Warehouse Processing Scale-Up for Massive Concurrent Queries with SPINTransactions on Large-Scale Data- and Knowledge-Centered Systems XVII10.1007/978-3-662-46335-2_1(1-23)Online publication date: 30-Jan-2015
  • (2013)CloudyProceedings of the 17th International Database Engineering & Applications Symposium10.1145/2513591.2513659(5-13)Online publication date: 9-Oct-2013
  • (2013)SPINProceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 805710.1007/978-3-642-40131-2_6(60-71)Online publication date: 26-Aug-2013
  • (2012)TEEPAProceedings of the 16th International Database Engineering & Applications Sysmposium10.1145/2351476.2351480(24-31)Online publication date: 8-Aug-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media