skip to main content
10.1145/2351476.2351480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

TEEPA: a timely-aware elastic parallel architecture

Published: 08 August 2012 Publication History

Abstract

Parallel Shared-Nothing architectures are frequently used to handle large star-schema Data Warehouses (DW). The continuous increase in data volume and the star-schema storage organization introduce severe limitations to scalability due to the well-known parallel join issues and the resulting need to use solutions such as on-the fly repartitioning of data or intermediate results, or massive replication of large data sets that still need to be joined locally, constraining their ability to deliver fast results. Parallelism may improve query performance, however some business decisions may require that query results be timely available which, even with additional parallelism and significant upgrade costs (both monetary and due to disturbance of normal operations), cannot be guaranteed. We propose a Timely-aware Execution Parallel Architecture (TEEPA) which balances data load and query processing among an elastic set of non-dedicated heterogeneous nodes in order to provide scale-out performance and timely query results. Data is allocated using adaptable storage models to minimize join costs (the major uncertainty factor) which best fit the nodes' capabilities, while preserving a consistent logical view of the star-schema. We present experimental evaluation of TEEPA and demonstrate its ability to provide timely results.

References

[1]
Abouzeid, A. et al. 2009. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment. 2, 1 (Aug. 2009), 922--933.
[2]
Abouzied, A. et al. 2010. HadoopDB in Action: Building Real World Applications. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (New York, NY, USA, 2010), 1111--1114.
[3]
Bajda-Pawlikowski, K. et al. 2011. Efficient processing of data warehousing queries in a split execution environment. Proceedings of the 2011 international conference on Management of data (Athens, Greece, 2011), 1165--1176.
[4]
Costa, J. P. et al. 2011. A Predictable Storage Model for Scalable Parallel DW. Fifteenth International Database Engineering and Applications Symposium (IDEAS 2011) (Lisbon, Portugal, Sep. 2011).
[5]
Costa, J. P. et al. 2011. ONE: a predictable and scalable DW model. Proceedings of the 13th international conference on Data warehousing and knowledge discovery (Toulouse, France, 2011), 1--13.
[6]
Costa, J. P. and Furtado, P. 2003. Time-Stratified Sampling for Approximate Answers to Aggregate Queries. International Conference on Database Systems for Advanced Applications (DASFAA 2003) (Kyoto, Japan, Mar. 2003), 215.
[7]
DeWitt, D. J. et al. 1984. Implementation techniques for main memory database systems. ACM SIGMOD Record (New York, NY, USA, 1984), 1--8.
[8]
Dittrich, J. et al. 2010. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proceedings of the VLDB Endowment. 3, 1--2 (Sep. 2010), 515--529.
[9]
Furtado, P. 2004. Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses. Data Warehousing and Knowledge Discovery. Springer Berlin / Heidelberg. 38--47.
[10]
Harris, E. P. and Ramamohanarao, K. 1996. Join algorithm costs revisited. The VLDB Journal --- The International Journal on Very Large Data Bases. 5, (Jan. 1996), 064--084.
[11]
Johnson, T. 1999. Performance Measurements of Compressed Bitmap Indices. Proceedings of the 25th International Conference on Very Large Data Bases. (1999), 278--289.
[12]
Liu, C. and Chen, H. 1996. A hash partition strategy for distributed query processing. Advances in Database Technology --- EDBT "96. P. Apers et al., eds. Springer-Verlag. 371--387.
[13]
Patel, J. M. et al. 1994. Accurate modeling of the hybrid hash join algorithm. ACM SIGMETRICS Performance Evaluation Review (NY, USA, 1994).
[14]
Pavlo, A. et al. 2009. A comparison of approaches to large-scale data analysis. Proc. of the 35th SIGMOD international conference on Management of data. (2009), 165--178.
[15]
Shasha, D. and Wang, T.-L. 1991. Optimizing equijoin queries in distributed databases where relations are hash partitioned. ACM Transactions on Database Systems. 16, 2 (May. 1991), 279--308.
[16]
Stonebraker, M. et al. 2005. C-store: a column-oriented DBMS. Proceedings of the 31st international conference on Very large data bases. (2005), 553--564.
[17]
TPC-H Benchmark: 2012. http://www.tpc.org/tpch/.
[18]
Zhang, Y. et al. 2010. MOSS-DB: a hardware-aware OLAP database. Proc. 11th Int. Conference on Web-age Information Management. (2010), 582--594.
[19]
Zhou, J. et al. 2007. Dynamic Materialized Views. Int. Conference on Data Engineering (Los Alamitos, CA, USA, 2007), 526--535.

Cited By

View all
  • (2017)Scalability and Realtime on Big Data, MapReduce, NoSQL and SparkBusiness Intelligence10.1007/978-3-319-61164-8_4(79-104)Online publication date: 4-Jul-2017
  • (2013)CloudyProceedings of the 17th International Database Engineering & Applications Symposium10.1145/2513591.2513659(5-13)Online publication date: 9-Oct-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Sysmposium
August 2012
261 pages
ISBN:9781450312349
DOI:10.1145/2351476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Charles University: Charles University
  • BytePress
  • Concordia University: Concordia University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data warehouse
  2. elastic parallel DW
  3. parallel shared-nothing
  4. star-schema model
  5. timely execution

Qualifiers

  • Research-article

Conference

IDEAS '12
Sponsor:
  • Charles University
  • Concordia University

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Scalability and Realtime on Big Data, MapReduce, NoSQL and SparkBusiness Intelligence10.1007/978-3-319-61164-8_4(79-104)Online publication date: 4-Jul-2017
  • (2013)CloudyProceedings of the 17th International Database Engineering & Applications Symposium10.1145/2513591.2513659(5-13)Online publication date: 9-Oct-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media