research-article

Cost/performance in modern data stores: how data caching systems succeed

Author:
David Lomet

Microsoft Research

Microsoft Research
View Profile

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New HardwareJune 2018Article No.: 9Pages 1–10https://doi.org/10.1145/3211922.3211927

Published:11 June 2018Publication History

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware

Pages 1–10

ABSTRACT

Data in traditional "caching" data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache". We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system [Deuteronomy and its Bw-tree] and a main memory system [MassTree] to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.

References

R. Appuswamy, R. Borovica-Gajic, G. Graefe, and A. Ailamaki: The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy, ADMS, 2017Google Scholar
Amazon Aurora https://aws.amazon.com/rds/aurora/details/Google Scholar
R. Bayer and E. M. McCreight, "Organization and Maintenance of Large Ordered Indices," Acta Inf., vol. 1, no. 1, pp. 173--189, 1972. Google ScholarDigital Library
J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, S. Zdonik: Anti-Caching: A New Approach to Database Management System Architecture. PVLDB 6(14): 1942--1953 (2013) Google ScholarDigital Library
LightNVM: The Linux Open-Channel SSD Subsystem. FAST 2017: 359--374. Google ScholarDigital Library
P. Bonnet: What's Up with the Storage Hierarchy? CIDR: 2017.Google Scholar
C. Diaconu, C. Freedman, E. Ismert, P. Larson, P. Mittal, R. Stonecipher, N. Verma, M. Zwilling: Hekaton: SQL server's memory-optimized OLTP engine. SIGMOD 2013: 1243--1254 Google ScholarDigital Library
S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, M. Strum: Optimizing Space Amplification in RocksDB. CIDR 2017Google Scholar
A. Eldawy, J. Levandoski, P. Larson: Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. PVLDB 7(11): 931--942 (2014) Google ScholarDigital Library
Flash file system: https://en.wikipedia.org/wiki/Flash_file_systemGoogle Scholar
J. Gray, G. R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD 1987: 395--398 Google ScholarDigital Library
J. Gray, G. Graefe: The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb. SIGMOD Record 26(4): 63--68 (1997) Google ScholarDigital Library
J. Gray: Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King, jimgray.azurewebsites.net/talks/flash_is_good.ppt, 12, 2006.Google Scholar
IBM DB2 https://en.wikipedia.org/wiki/IBM_Db2Google Scholar
Intel: Introduction to the Storage Performance Development Kit (SPDK) https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-development-kit-spdkGoogle Scholar
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi, H-Store: a High-Performance, Distributed Main Memory Transaction Processing System, PVLDB 1(2): 1496--1499 (2008). Google ScholarDigital Library
A. Kemper, T. Neumann, J. Finis, F. Funke, V. Leis, H. Mülhe, T. Mühhlbauer, W. Rödiger: Processing in the Hybrid OLTP & OLAP Main-Memory Database System HyPer. IEEE Data Eng. Bull. 36(2): 41--47 (2013)Google Scholar
J. Lee, M. Müehle, N. May, F. Faerber, V. Sikka, H. Plattner, J. Krüger, M. Grund: High-Performance Transaction Processing in SAP HANA. IEEE Data Eng. Bull. 36(2): 28--33 (2013)Google Scholar
Viktor Leis, Michael Haubenschild, Alfons Kemper, Thomas Neumann LeanStore: In-Memory Data Management Beyond Main Memory ICDE 2018Google Scholar
J. Levandoski, D. Lomet, and S. Sengupta, The Bw-Tree: A B-tree for New Hardware Platforms, ICDE 2013, pp. 302--313. Google ScholarDigital Library
J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A Cache/Storage Subsystem for Modern Hardware. PVLDB 6(10): 877--888 (2013). Google ScholarDigital Library
J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, R. Wang: High Performance Transactions in Deuteronomy. CIDR 2015.Google Scholar
Y. Mao, E. Kohler, R. T. Morris. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys, 2012, pp. 183--196. Google ScholarDigital Library
Microsoft SQL Server https://en.wikipedia.org/wiki/Microsoft_SQL_ServerGoogle Scholar
P. E. O'Neil, E. Cheng, D. Gawlick, E. J. O'Neil. The Log-Structured Merge-Tree (LSM-Tree). in Acta Inf. 33(4): 351--385 (1996) Google ScholarDigital Library
Oracle Database https://en.wikipedia.org/wiki/Oracle_DatabaseGoogle Scholar
R. Sears, R. Ramakrishnan: bLSM: a general purpose log-structured merge tree. SIGMOD 2012: 217--228 Google ScholarDigital Library
RocksDB: A persistent key-value store for fast storage environments. http://rocksdb.org/Google Scholar
LevelDB http://leveldb.org/Google Scholar
M. Rosenblum and J. Ousterhout, "The Design and Implementation of a Log-Structured File System," ACM Trans. Comput. Syst., 10(1), 26--52 (1992). Google ScholarDigital Library
D. Shukla et al: Schema-Agnostic Indexing with Azure DocumentDB. in PVLDB 8(12): 1668--1679 (2015) Google ScholarDigital Library
M. Stonebraker, A. Weisberg: The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 36(2): 21--27 (2013)Google Scholar
TPC: History and Overview of the TPC. http://www.tpc.org/information/about/history.aspGoogle Scholar
M. Stonebraker, U. Cetintemel, "One size fits all": an idea whose time has come and gone. ICDE 2005. Google ScholarDigital Library
A. Ailamaki, Database Architectures for New Hardware. VLDB 2004. Google ScholarDigital Library

Recommendations

Exploiting the performance gains of modern disk drives by enhancing data locality

Due to the widening performance gap between RAM and disk drives, a large number of I/O optimization methods have been proposed and designed to alleviate the impact of this gap. One of the most effective approaches of improving disk access performance is ...
Read More
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Read More
Analytically modeling the memory hierarchy performance of modern processor systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware
June 2018
75 pages
ISBN:9781450358538
DOI:10.1145/3211922
Conference Chairs:
Wolfgang Lehner
Technische Universität Dresden
,
Ken Salem
University of Waterloo
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate80of102submissions,78%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 493
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cost/performance in modern data stores: how data caching systems succeed

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Exploiting the performance gains of modern disk drives by enhancing data locality

Increasing hardware data prefetching performance using the second-level cache

Analytically modeling the memory hierarchy performance of modern processor systems