research-article

Aggregation strategies for columnar in-memory databases in a mixed workload

Authors:
Stephan Müller

Hasso-Plattner-Institut, Potsdam, Germany

Hasso-Plattner-Institut, Potsdam, Germany
View Profile

,
Hasso Plattner

Hasso-Plattner-Institut, Potsdam, Germany

Hasso-Plattner-Institut, Potsdam, Germany
View Profile

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge managementOctober 2011Pages 51–58https://doi.org/10.1145/2065003.2065015

Published:28 October 2011Publication History

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

Pages 51–58

ABSTRACT

The recent trend towards analytics on operational data has led to an approach of reunifying online transactional processing and online analytical processing in one single database. The advent of columnar in-memory databases makes this viable and feasible as expensive join and aggregation operations can be performed with superior performance compared to traditional row-oriented databases. This has led to the radical proposal of abandoning materialized aggregate tables and calculate all aggregations on the fly.

This PhD research project investigates factors that have an influence on the aggregation performance in columnar in-memory databases. Based on the identified factors, we aim to evaluate different cost model approaches, that are subject to validation with real-life data of large industry customers and their mixed workloads. The goal of this project is the design and implementation of an aggregation engine that decides, based on the data and application characteristics, the historic and current workload and other cost-relevant factors, whether it is beneficial with regards to query performance, but also considering aggregation view maintenance costs, to materialize an aggregate or not.

References

D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. SIGMOD, 2006. Google ScholarDigital Library
D. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? SIGMOD, 2008. Google ScholarDigital Library
D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.Google ScholarCross Ref
D. Agrawal, A. El Abbadi, A. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In SIGMOD, 1997. Google ScholarDigital Library
A. Ailamaki, D. DeWitt, M. Hill, and D. Wood. DBMSs on a Modern Processor: Where Does Time Go? In VLDB, 1999. Google ScholarDigital Library
P. Boncz, M. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51:77--85, 2008. Google ScholarDigital Library
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1):65--74, 1997. Google ScholarDigital Library
J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, 2007. Google ScholarDigital Library
E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 1970. Google ScholarDigital Library
U. Dayal, H. Kuno, J. Wiener, K. Wilkinson, A. Ganapathi, and S. Krompass. Managing operational business intelligence workloads. In ACM SIGOPS, 2009. Google ScholarDigital Library
A. Ganapathi, H. Kuno, U. Dayal, J. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592--603, 2009. Google ScholarDigital Library
H. Garcia-Molina and K. Salem. Main memory database systems: an overview. Transactions on Knowledge and Data Engineering, 4(6):509--516, 1992. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--169, 1993. Google ScholarDigital Library
J. Gray and Bosworth. Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS. In ICDE, pages 152--159, 1996. Google ScholarDigital Library
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: a main memory hybrid storage engine. In PVLDB, 2010. Google ScholarDigital Library
A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. VLDB, 1995. Google ScholarDigital Library
H. Gupta and S. Mumick. Selection of views to materialize under a maintenance cost constraint. ICDT, 1999. Google ScholarDigital Library
A. Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270--294, 2001. Google ScholarDigital Library
J. Hellerstein and P. Haas. Online aggregation. In SIGMOD, 1997. Google ScholarDigital Library
W. Hou and G. Ozsoyoglu. Processing aggregate relational queries with hard time constraints. ACM SIGMOD Record, 1989. Google ScholarDigital Library
H. Kuno, U. Dayal, J. Wiener, and K. Wilkinson. Managing Dynamic Mixed Workloads for Operational Business Intelligence. In DNIS, pages 11--26, 2010. Google ScholarDigital Library
J. Li and D. Rotem. Aggregation algorithms for very large compressed data warehouses. In VLDB, 1999. Google ScholarDigital Library
S. Listgarten and M.-A. Naimat. Modelling Costs for a MM-DBMS. In Real-Time Databases, Issues and Applications (RTDB), pages 72--78, 1996.Google Scholar
S. Manegold, P. Boncz, and M. Kersten. Generic database cost models for hierarchical memory systems. In VLDB, 2002. Google ScholarDigital Library
V. Markl and G. Lohman. Learning table access cardinalities with LEO. In SIGMOD, 2002. Google ScholarDigital Library
H. Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In SIGMOD, 2009. Google ScholarDigital Library
H. Plattner and A. Zeier. In-Memory Data Management: An Inection Point for Enterprise Applications. Springer, 2011. Google ScholarDigital Library
J. Smith and D. Smith. Database abstractions: aggregation. ACM Transactions on Database Systems, 1977. Google ScholarDigital Library
D. Srivastava, S. Dar, H. Jagadish, and A. Levy. Answering queries with aggregation using views. In VLDB, 1996. Google ScholarDigital Library
D. Taniar, C. Leung, J. Rahayu, and S. Goel. High-Performance Parallel Database Processing and Grid Databases. John Wiley & Sons, 2008. Google ScholarDigital Library
C. Tinnefeld, S. Müller, H. Kaltegärtner, S. Hillig, L. Butzmann, D. Eickhoff, S. Klkauck, D. Taschik, B. Wagner, O. Xylander, A. Zeier, H. Plattner, and C. Tosun. Available-To-Promise on an In-Memory Column Store. In BTW, pages 667--686, 2011.Google Scholar
N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing XML queries. In VLDB, 2005. Google ScholarDigital Library

Index Terms

Aggregation strategies for columnar in-memory databases in a mixed workload
1. Information systems
  1. Data management systems

Recommendations

An in-depth analysis of data aggregation cost factors in a columnar in-memory database
DOLAP '12: Proceedings of the fifteenth international workshop on Data warehousing and OLAP

Precise prediction of query execution performance is the basis for various database optimization strategies. With columnar in-memory databases, cost modeling changes in two dimensions: First, models for disk-based databases are not well-suited as the ...
Read More
Real-time analytical processing with SQL server
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii

Over the last two releases SQL Server has integrated two specialized engines into the core system: the Apollo column store engine for analytical workloads and the Hekaton in-memory engine for high-performance OLTP workloads. There is an increasing ...
Read More
Assessing the Suitability of In-Memory Databases in an Enterprise Context
ES '15: Proceedings of the 2015 International Conference on Enterprise Systems

It is still not fully clear if the increased query execution speed offered by in-memory databases unfolds its potential benefits over traditional disk-based databases in an enterprise context. This paper aims at comparing the performance of in-memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
October 2011
100 pages
ISBN:9781450309530
DOI:10.1145/2065003
Program Chairs:
Anisoara Nica
Sybase, An SAP Company, Canada
,
Fabian M. Suchanek
INRIA, France
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
column store
cost model
data aggregation
in-memory database
materialized view
olap
oltp
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate25of62submissions,40%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 401
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Aggregation strategies for columnar in-memory databases in a mixed workload

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

An in-depth analysis of data aggregation cost factors in a columnar in-memory database

Real-time analytical processing with SQL server

Assessing the Suitability of In-Memory Databases in an Enterprise Context