Skip to main content

Runtime Fragility in Main Memory

  • Conference paper
  • First Online:
Data Management on New Hardware (ADMS 2016, IMDM 2016)

Abstract

In this paper we investigate the following problem: Given a database workload (tables and queries), which data layout (row, column or a suitable PAX-layout) should we choose in order to get the best possible performance? We show that this is not an easy problem. We explore careful combinations of various parameters that have an impact on the performance including: (1) the schema, (2) the CPU architecture, (3) the compiler, and (4) the optimization level. We include a CPU from each of the past 4 generations of Intel CPUs.

In addition, we demonstrate the importance of taking variance into account when deciding on the optimal storage layout. We observe considerable variance throughout our measurements which makes it difficult to argue along means over different runs of an experiment. Therefore, we compute confidence intervals for all measurements and exploit this to detect outliers and define classes of methods that we are not allowed to distinguish statistically. The variance of different performance measurements can be so significant that the optimal solution may not be the best one in practice.

Our results also indicate that a carefully or ill-chosen compilation setup can trigger a performance gain or loss of factor 1.1 to factor 25 in even the simplest workloads: a table with four attributes and a simple query reading those attributes. This latter observation is not caused by variance in the measured runtimes, but due to using a different compiler setup.

Besides the compilation setup, the data layout is another source of query time fragility. Various size metrics of the memory subsystem are round numbers in binary, or put more simply: powers of 2 in decimal. System engineers have followed this tradition over time. Surprisingly, there exists a use-case in query processing where using powers of 2 is always a suboptimal choice, leading to one more cause of fragile query times. Using this finding, we will show how to improve tuple-reconstruction costs by using a novel main-memory data-layout.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    min = min XOR ((temp XOR min) AND NEG(temp < min)).

  2. 2.

    More precisely their C++ front-ends: clang++, g++, and icpc.

References

  1. Ailamaki, A., et al.: Weaving relations for cache performance. In: VLDB 2001, pp. 169–180 (2001)

    Google Scholar 

  2. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)

    Google Scholar 

  3. Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual

    Google Scholar 

  4. Grubbs, F.E.: Sample criteria for testing outlying observations. Ann. Math. Stat. 21, 27–58 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  5. Grund, M., et al.: HYRISE: a main memory hybrid storage engine. PVLDB 4(2), 105–116 (2010)

    MathSciNet  Google Scholar 

  6. Jain, R.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)

    MATH  Google Scholar 

  7. Jindal, A., Palatinus, E., Pavlov, V., Dittrich, J.: A comparison of knives for bread slicing. PVLDB 6(6), 361–372 (2013)

    Google Scholar 

  8. Patterson, D., Hennessy, J.: Computer Organization and Design, Fourth Edition: The Hardware/Software Interface. The Morgan Kaufmann Series in Computer Architecture and Design, 4th edn. Elsevier Science, Amsterdam (2008)

    Google Scholar 

  9. Pirk, H., et al.: CPU and cache efficient management of memory-resident databases. In: ICDE 2013, pp. 14–25 (2013)

    Google Scholar 

  10. TPC-H Standard Specification. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf

  11. Welch, B.L.: The generalization of Student’s problem when several different population variances are involved. Biometrika 34, 28–35 (1947)

    MathSciNet  MATH  Google Scholar 

  12. Zukowski, M., Boncz, P.A., Nes, N., Héman, S.: MonetDB/X100-A DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)

    Google Scholar 

Download references

Acknowledgments

Research supported by BMBF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Endre Palatinus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Palatinus, E., Dittrich, J. (2017). Runtime Fragility in Main Memory. In: Blanas, S., Bordawekar, R., Lahiri, T., Levandoski, J., Pavlo, A. (eds) Data Management on New Hardware. ADMS IMDM 2016 2016. Lecture Notes in Computer Science(), vol 10195. Springer, Cham. https://doi.org/10.1007/978-3-319-56111-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56111-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56110-3

  • Online ISBN: 978-3-319-56111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics