Runtime Fragility in Main Memory

Palatinus, Endre; Dittrich, Jens

doi:10.1007/978-3-319-56111-0_9

Endre Palatinus¹⁸ &
Jens Dittrich¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10195))

Included in the following conference series:

697 Accesses

Abstract

In this paper we investigate the following problem: Given a database workload (tables and queries), which data layout (row, column or a suitable PAX-layout) should we choose in order to get the best possible performance? We show that this is not an easy problem. We explore careful combinations of various parameters that have an impact on the performance including: (1) the schema, (2) the CPU architecture, (3) the compiler, and (4) the optimization level. We include a CPU from each of the past 4 generations of Intel CPUs.

In addition, we demonstrate the importance of taking variance into account when deciding on the optimal storage layout. We observe considerable variance throughout our measurements which makes it difficult to argue along means over different runs of an experiment. Therefore, we compute confidence intervals for all measurements and exploit this to detect outliers and define classes of methods that we are not allowed to distinguish statistically. The variance of different performance measurements can be so significant that the optimal solution may not be the best one in practice.

Our results also indicate that a carefully or ill-chosen compilation setup can trigger a performance gain or loss of factor 1.1 to factor 25 in even the simplest workloads: a table with four attributes and a simple query reading those attributes. This latter observation is not caused by variance in the measured runtimes, but due to using a different compiler setup.

Besides the compilation setup, the data layout is another source of query time fragility. Various size metrics of the memory subsystem are round numbers in binary, or put more simply: powers of 2 in decimal. System engineers have followed this tradition over time. Surprisingly, there exists a use-case in query processing where using powers of 2 is always a suboptimal choice, leading to one more cause of fragile query times. Using this finding, we will show how to improve tuple-reconstruction costs by using a novel main-memory data-layout.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
min = min XOR ((temp XOR min) AND NEG(temp < min)).
2.
More precisely their C++ front-ends: clang++, g++, and icpc.

References

Ailamaki, A., et al.: Weaving relations for cache performance. In: VLDB 2001, pp. 169–180 (2001)
Google Scholar
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)
Google Scholar
Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual
Google Scholar
Grubbs, F.E.: Sample criteria for testing outlying observations. Ann. Math. Stat. 21, 27–58 (1950)
Article MathSciNet MATH Google Scholar
Grund, M., et al.: HYRISE: a main memory hybrid storage engine. PVLDB 4(2), 105–116 (2010)
MathSciNet Google Scholar
Jain, R.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)
MATH Google Scholar
Jindal, A., Palatinus, E., Pavlov, V., Dittrich, J.: A comparison of knives for bread slicing. PVLDB 6(6), 361–372 (2013)
Google Scholar
Patterson, D., Hennessy, J.: Computer Organization and Design, Fourth Edition: The Hardware/Software Interface. The Morgan Kaufmann Series in Computer Architecture and Design, 4th edn. Elsevier Science, Amsterdam (2008)
Google Scholar
Pirk, H., et al.: CPU and cache efficient management of memory-resident databases. In: ICDE 2013, pp. 14–25 (2013)
Google Scholar
TPC-H Standard Specification. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf
Welch, B.L.: The generalization of Student’s problem when several different population variances are involved. Biometrika 34, 28–35 (1947)
MathSciNet MATH Google Scholar
Zukowski, M., Boncz, P.A., Nes, N., Héman, S.: MonetDB/X100-A DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)
Google Scholar

Download references

Acknowledgments

Research supported by BMBF.

Author information

Authors and Affiliations

Information Systems Group, Saarland University, Saarbrücken, Germany
Endre Palatinus & Jens Dittrich

Authors

Endre Palatinus
View author publications
You can also search for this author in PubMed Google Scholar
Jens Dittrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Endre Palatinus .

Editor information

Editors and Affiliations

Ohio State University, Columbus, Ohio, USA
Spyros Blanas
IBM Thomas J Watson Research Center, Yorktown Heights, New York, USA
Rajesh Bordawekar
Oracle Cor., Redwood Shores, California, USA
Tirthankar Lahiri
Computer Science Department, Microsoft Corporation, Redmond, Washington, USA
Justin Levandoski
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Andrew Pavlo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palatinus, E., Dittrich, J. (2017). Runtime Fragility in Main Memory. In: Blanas, S., Bordawekar, R., Lahiri, T., Levandoski, J., Pavlo, A. (eds) Data Management on New Hardware. ADMS IMDM 2016 2016. Lecture Notes in Computer Science(), vol 10195. Springer, Cham. https://doi.org/10.1007/978-3-319-56111-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-56111-0_9
Published: 23 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56110-3
Online ISBN: 978-3-319-56111-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics