skip to main content
10.1145/2065003.2065015acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Aggregation strategies for columnar in-memory databases in a mixed workload

Published:28 October 2011Publication History

ABSTRACT

The recent trend towards analytics on operational data has led to an approach of reunifying online transactional processing and online analytical processing in one single database. The advent of columnar in-memory databases makes this viable and feasible as expensive join and aggregation operations can be performed with superior performance compared to traditional row-oriented databases. This has led to the radical proposal of abandoning materialized aggregate tables and calculate all aggregations on the fly.

This PhD research project investigates factors that have an influence on the aggregation performance in columnar in-memory databases. Based on the identified factors, we aim to evaluate different cost model approaches, that are subject to validation with real-life data of large industry customers and their mixed workloads. The goal of this project is the design and implementation of an aggregation engine that decides, based on the data and application characteristics, the historic and current workload and other cost-relevant factors, whether it is beneficial with regards to query performance, but also considering aggregation view maintenance costs, to materialize an aggregate or not.

References

  1. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Agrawal, A. El Abbadi, A. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Ailamaki, D. DeWitt, M. Hill, and D. Wood. DBMSs on a Modern Processor: Where Does Time Go? In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Boncz, M. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51:77--85, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1):65--74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Dayal, H. Kuno, J. Wiener, K. Wilkinson, A. Ganapathi, and S. Krompass. Managing operational business intelligence workloads. In ACM SIGOPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Ganapathi, H. Kuno, U. Dayal, J. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592--603, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Garcia-Molina and K. Salem. Main memory database systems: an overview. Transactions on Knowledge and Data Engineering, 4(6):509--516, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--169, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gray and Bosworth. Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS. In ICDE, pages 152--159, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: a main memory hybrid storage engine. In PVLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Gupta and S. Mumick. Selection of views to materialize under a maintenance cost constraint. ICDT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270--294, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hellerstein and P. Haas. Online aggregation. In SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Hou and G. Ozsoyoglu. Processing aggregate relational queries with hard time constraints. ACM SIGMOD Record, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Kuno, U. Dayal, J. Wiener, and K. Wilkinson. Managing Dynamic Mixed Workloads for Operational Business Intelligence. In DNIS, pages 11--26, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Li and D. Rotem. Aggregation algorithms for very large compressed data warehouses. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Listgarten and M.-A. Naimat. Modelling Costs for a MM-DBMS. In Real-Time Databases, Issues and Applications (RTDB), pages 72--78, 1996.Google ScholarGoogle Scholar
  24. S. Manegold, P. Boncz, and M. Kersten. Generic database cost models for hierarchical memory systems. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Markl and G. Lohman. Learning table access cardinalities with LEO. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Plattner and A. Zeier. In-Memory Data Management: An Inection Point for Enterprise Applications. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Smith and D. Smith. Database abstractions: aggregation. ACM Transactions on Database Systems, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Srivastava, S. Dar, H. Jagadish, and A. Levy. Answering queries with aggregation using views. In VLDB, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Taniar, C. Leung, J. Rahayu, and S. Goel. High-Performance Parallel Database Processing and Grid Databases. John Wiley & Sons, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Tinnefeld, S. Müller, H. Kaltegärtner, S. Hillig, L. Butzmann, D. Eickhoff, S. Klkauck, D. Taschik, B. Wagner, O. Xylander, A. Zeier, H. Plattner, and C. Tosun. Available-To-Promise on an In-Memory Column Store. In BTW, pages 667--686, 2011.Google ScholarGoogle Scholar
  32. N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing XML queries. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregation strategies for columnar in-memory databases in a mixed workload

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
      October 2011
      100 pages
      ISBN:9781450309530
      DOI:10.1145/2065003

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 October 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate25of62submissions,40%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader