Abstract
In this paper we investigate the following problem: Given a database workload (tables and queries), which data layout (row, column or a suitable PAX-layout) should we choose in order to get the best possible performance? We show that this is not an easy problem. We explore careful combinations of various parameters that have an impact on the performance including: (1) the schema, (2) the CPU architecture, (3) the compiler, and (4) the optimization level. We include a CPU from each of the past 4 generations of Intel CPUs.
In addition, we demonstrate the importance of taking variance into account when deciding on the optimal storage layout. We observe considerable variance throughout our measurements which makes it difficult to argue along means over different runs of an experiment. Therefore, we compute confidence intervals for all measurements and exploit this to detect outliers and define classes of methods that we are not allowed to distinguish statistically. The variance of different performance measurements can be so significant that the optimal solution may not be the best one in practice.
Our results also indicate that a carefully or ill-chosen compilation setup can trigger a performance gain or loss of factor 1.1 to factor 25 in even the simplest workloads: a table with four attributes and a simple query reading those attributes. This latter observation is not caused by variance in the measured runtimes, but due to using a different compiler setup.
Besides the compilation setup, the data layout is another source of query time fragility. Various size metrics of the memory subsystem are round numbers in binary, or put more simply: powers of 2 in decimal. System engineers have followed this tradition over time. Surprisingly, there exists a use-case in query processing where using powers of 2 is always a suboptimal choice, leading to one more cause of fragile query times. Using this finding, we will show how to improve tuple-reconstruction costs by using a novel main-memory data-layout.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
min = min XOR ((temp XOR min) AND NEG(temp < min)).
- 2.
More precisely their C++ front-ends: clang++, g++, and icpc.
References
Ailamaki, A., et al.: Weaving relations for cache performance. In: VLDB 2001, pp. 169–180 (2001)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)
Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual
Grubbs, F.E.: Sample criteria for testing outlying observations. Ann. Math. Stat. 21, 27–58 (1950)
Grund, M., et al.: HYRISE: a main memory hybrid storage engine. PVLDB 4(2), 105–116 (2010)
Jain, R.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)
Jindal, A., Palatinus, E., Pavlov, V., Dittrich, J.: A comparison of knives for bread slicing. PVLDB 6(6), 361–372 (2013)
Patterson, D., Hennessy, J.: Computer Organization and Design, Fourth Edition: The Hardware/Software Interface. The Morgan Kaufmann Series in Computer Architecture and Design, 4th edn. Elsevier Science, Amsterdam (2008)
Pirk, H., et al.: CPU and cache efficient management of memory-resident databases. In: ICDE 2013, pp. 14–25 (2013)
TPC-H Standard Specification. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf
Welch, B.L.: The generalization of Student’s problem when several different population variances are involved. Biometrika 34, 28–35 (1947)
Zukowski, M., Boncz, P.A., Nes, N., Héman, S.: MonetDB/X100-A DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)
Acknowledgments
Research supported by BMBF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Palatinus, E., Dittrich, J. (2017). Runtime Fragility in Main Memory. In: Blanas, S., Bordawekar, R., Lahiri, T., Levandoski, J., Pavlo, A. (eds) Data Management on New Hardware. ADMS IMDM 2016 2016. Lecture Notes in Computer Science(), vol 10195. Springer, Cham. https://doi.org/10.1007/978-3-319-56111-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-56111-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56110-3
Online ISBN: 978-3-319-56111-0
eBook Packages: Computer ScienceComputer Science (R0)