ABSTRACT
The recent trend towards analytics on operational data has led to an approach of reunifying online transactional processing and online analytical processing in one single database. The advent of columnar in-memory databases makes this viable and feasible as expensive join and aggregation operations can be performed with superior performance compared to traditional row-oriented databases. This has led to the radical proposal of abandoning materialized aggregate tables and calculate all aggregations on the fly.
This PhD research project investigates factors that have an influence on the aggregation performance in columnar in-memory databases. Based on the identified factors, we aim to evaluate different cost model approaches, that are subject to validation with real-life data of large industry customers and their mixed workloads. The goal of this project is the design and implementation of an aggregation engine that decides, based on the data and application characteristics, the historic and current workload and other cost-relevant factors, whether it is beneficial with regards to query performance, but also considering aggregation view maintenance costs, to materialize an aggregate or not.
- D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. SIGMOD, 2006. Google ScholarDigital Library
- D. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? SIGMOD, 2008. Google ScholarDigital Library
- D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.Google ScholarCross Ref
- D. Agrawal, A. El Abbadi, A. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In SIGMOD, 1997. Google ScholarDigital Library
- A. Ailamaki, D. DeWitt, M. Hill, and D. Wood. DBMSs on a Modern Processor: Where Does Time Go? In VLDB, 1999. Google ScholarDigital Library
- P. Boncz, M. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51:77--85, 2008. Google ScholarDigital Library
- S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1):65--74, 1997. Google ScholarDigital Library
- J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, 2007. Google ScholarDigital Library
- E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 1970. Google ScholarDigital Library
- U. Dayal, H. Kuno, J. Wiener, K. Wilkinson, A. Ganapathi, and S. Krompass. Managing operational business intelligence workloads. In ACM SIGOPS, 2009. Google ScholarDigital Library
- A. Ganapathi, H. Kuno, U. Dayal, J. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592--603, 2009. Google ScholarDigital Library
- H. Garcia-Molina and K. Salem. Main memory database systems: an overview. Transactions on Knowledge and Data Engineering, 4(6):509--516, 1992. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--169, 1993. Google ScholarDigital Library
- J. Gray and Bosworth. Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS. In ICDE, pages 152--159, 1996. Google ScholarDigital Library
- M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: a main memory hybrid storage engine. In PVLDB, 2010. Google ScholarDigital Library
- A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. VLDB, 1995. Google ScholarDigital Library
- H. Gupta and S. Mumick. Selection of views to materialize under a maintenance cost constraint. ICDT, 1999. Google ScholarDigital Library
- A. Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270--294, 2001. Google ScholarDigital Library
- J. Hellerstein and P. Haas. Online aggregation. In SIGMOD, 1997. Google ScholarDigital Library
- W. Hou and G. Ozsoyoglu. Processing aggregate relational queries with hard time constraints. ACM SIGMOD Record, 1989. Google ScholarDigital Library
- H. Kuno, U. Dayal, J. Wiener, and K. Wilkinson. Managing Dynamic Mixed Workloads for Operational Business Intelligence. In DNIS, pages 11--26, 2010. Google ScholarDigital Library
- J. Li and D. Rotem. Aggregation algorithms for very large compressed data warehouses. In VLDB, 1999. Google ScholarDigital Library
- S. Listgarten and M.-A. Naimat. Modelling Costs for a MM-DBMS. In Real-Time Databases, Issues and Applications (RTDB), pages 72--78, 1996.Google Scholar
- S. Manegold, P. Boncz, and M. Kersten. Generic database cost models for hierarchical memory systems. In VLDB, 2002. Google ScholarDigital Library
- V. Markl and G. Lohman. Learning table access cardinalities with LEO. In SIGMOD, 2002. Google ScholarDigital Library
- H. Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In SIGMOD, 2009. Google ScholarDigital Library
- H. Plattner and A. Zeier. In-Memory Data Management: An Inection Point for Enterprise Applications. Springer, 2011. Google ScholarDigital Library
- J. Smith and D. Smith. Database abstractions: aggregation. ACM Transactions on Database Systems, 1977. Google ScholarDigital Library
- D. Srivastava, S. Dar, H. Jagadish, and A. Levy. Answering queries with aggregation using views. In VLDB, 1996. Google ScholarDigital Library
- D. Taniar, C. Leung, J. Rahayu, and S. Goel. High-Performance Parallel Database Processing and Grid Databases. John Wiley & Sons, 2008. Google ScholarDigital Library
- C. Tinnefeld, S. Müller, H. Kaltegärtner, S. Hillig, L. Butzmann, D. Eickhoff, S. Klkauck, D. Taschik, B. Wagner, O. Xylander, A. Zeier, H. Plattner, and C. Tosun. Available-To-Promise on an In-Memory Column Store. In BTW, pages 667--686, 2011.Google Scholar
- N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing XML queries. In VLDB, 2005. Google ScholarDigital Library
Index Terms
- Aggregation strategies for columnar in-memory databases in a mixed workload
Recommendations
An in-depth analysis of data aggregation cost factors in a columnar in-memory database
DOLAP '12: Proceedings of the fifteenth international workshop on Data warehousing and OLAPPrecise prediction of query execution performance is the basis for various database optimization strategies. With columnar in-memory databases, cost modeling changes in two dimensions: First, models for disk-based databases are not well-suited as the ...
Real-time analytical processing with SQL server
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, HawaiiOver the last two releases SQL Server has integrated two specialized engines into the core system: the Apollo column store engine for analytical workloads and the Hekaton in-memory engine for high-performance OLTP workloads. There is an increasing ...
Assessing the Suitability of In-Memory Databases in an Enterprise Context
ES '15: Proceedings of the 2015 International Conference on Enterprise SystemsIt is still not fully clear if the increased query execution speed offered by in-memory databases unfolds its potential benefits over traditional disk-based databases in an enterprise context. This paper aims at comparing the performance of in-memory ...
Comments