ABSTRACT
Data in traditional "caching" data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache". We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system [Deuteronomy and its Bw-tree] and a main memory system [MassTree] to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.
- R. Appuswamy, R. Borovica-Gajic, G. Graefe, and A. Ailamaki: The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy, ADMS, 2017Google Scholar
- Amazon Aurora https://aws.amazon.com/rds/aurora/details/Google Scholar
- R. Bayer and E. M. McCreight, "Organization and Maintenance of Large Ordered Indices," Acta Inf., vol. 1, no. 1, pp. 173--189, 1972. Google ScholarDigital Library
- J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, S. Zdonik: Anti-Caching: A New Approach to Database Management System Architecture. PVLDB 6(14): 1942--1953 (2013) Google ScholarDigital Library
- LightNVM: The Linux Open-Channel SSD Subsystem. FAST 2017: 359--374. Google ScholarDigital Library
- P. Bonnet: What's Up with the Storage Hierarchy? CIDR: 2017.Google Scholar
- C. Diaconu, C. Freedman, E. Ismert, P. Larson, P. Mittal, R. Stonecipher, N. Verma, M. Zwilling: Hekaton: SQL server's memory-optimized OLTP engine. SIGMOD 2013: 1243--1254 Google ScholarDigital Library
- S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, M. Strum: Optimizing Space Amplification in RocksDB. CIDR 2017Google Scholar
- A. Eldawy, J. Levandoski, P. Larson: Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. PVLDB 7(11): 931--942 (2014) Google ScholarDigital Library
- Flash file system: https://en.wikipedia.org/wiki/Flash_file_systemGoogle Scholar
- J. Gray, G. R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD 1987: 395--398 Google ScholarDigital Library
- J. Gray, G. Graefe: The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb. SIGMOD Record 26(4): 63--68 (1997) Google ScholarDigital Library
- J. Gray: Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King, jimgray.azurewebsites.net/talks/flash_is_good.ppt, 12, 2006.Google Scholar
- IBM DB2 https://en.wikipedia.org/wiki/IBM_Db2Google Scholar
- Intel: Introduction to the Storage Performance Development Kit (SPDK) https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-development-kit-spdkGoogle Scholar
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi, H-Store: a High-Performance, Distributed Main Memory Transaction Processing System, PVLDB 1(2): 1496--1499 (2008). Google ScholarDigital Library
- A. Kemper, T. Neumann, J. Finis, F. Funke, V. Leis, H. Mülhe, T. Mühhlbauer, W. Rödiger: Processing in the Hybrid OLTP & OLAP Main-Memory Database System HyPer. IEEE Data Eng. Bull. 36(2): 41--47 (2013)Google Scholar
- J. Lee, M. Müehle, N. May, F. Faerber, V. Sikka, H. Plattner, J. Krüger, M. Grund: High-Performance Transaction Processing in SAP HANA. IEEE Data Eng. Bull. 36(2): 28--33 (2013)Google Scholar
- Viktor Leis, Michael Haubenschild, Alfons Kemper, Thomas Neumann LeanStore: In-Memory Data Management Beyond Main Memory ICDE 2018Google Scholar
- J. Levandoski, D. Lomet, and S. Sengupta, The Bw-Tree: A B-tree for New Hardware Platforms, ICDE 2013, pp. 302--313. Google ScholarDigital Library
- J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A Cache/Storage Subsystem for Modern Hardware. PVLDB 6(10): 877--888 (2013). Google ScholarDigital Library
- J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, R. Wang: High Performance Transactions in Deuteronomy. CIDR 2015.Google Scholar
- Y. Mao, E. Kohler, R. T. Morris. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys, 2012, pp. 183--196. Google ScholarDigital Library
- Microsoft SQL Server https://en.wikipedia.org/wiki/Microsoft_SQL_ServerGoogle Scholar
- P. E. O'Neil, E. Cheng, D. Gawlick, E. J. O'Neil. The Log-Structured Merge-Tree (LSM-Tree). in Acta Inf. 33(4): 351--385 (1996) Google ScholarDigital Library
- Oracle Database https://en.wikipedia.org/wiki/Oracle_DatabaseGoogle Scholar
- R. Sears, R. Ramakrishnan: bLSM: a general purpose log-structured merge tree. SIGMOD 2012: 217--228 Google ScholarDigital Library
- RocksDB: A persistent key-value store for fast storage environments. http://rocksdb.org/Google Scholar
- LevelDB http://leveldb.org/Google Scholar
- M. Rosenblum and J. Ousterhout, "The Design and Implementation of a Log-Structured File System," ACM Trans. Comput. Syst., 10(1), 26--52 (1992). Google ScholarDigital Library
- D. Shukla et al: Schema-Agnostic Indexing with Azure DocumentDB. in PVLDB 8(12): 1668--1679 (2015) Google ScholarDigital Library
- M. Stonebraker, A. Weisberg: The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 36(2): 21--27 (2013)Google Scholar
- TPC: History and Overview of the TPC. http://www.tpc.org/information/about/history.aspGoogle Scholar
- M. Stonebraker, U. Cetintemel, "One size fits all": an idea whose time has come and gone. ICDE 2005. Google ScholarDigital Library
- A. Ailamaki, Database Architectures for New Hardware. VLDB 2004. Google ScholarDigital Library
Recommendations
Exploiting the performance gains of modern disk drives by enhancing data locality
Due to the widening performance gap between RAM and disk drives, a large number of I/O optimization methods have been proposed and designed to alleviate the impact of this gap. One of the most effective approaches of improving disk access performance is ...
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Comments