skip to main content
research-article
Public Access

SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Policy

Published: 30 May 2023 Publication History

Abstract

A critical aspect of modern key-value stores is the interaction between compaction policy and filters. Aggressive compaction reduces the on-disk footprint of a key-value store and can improve query performance, but can reduce insertion throughput because it is I/O and CPU expensive. Filters can mitigate the query costs of lazy compaction, but only if they fit in RAM, limiting the scalability of queries with lazy compaction. And, with fast storage devices, the CPU costs of querying filters in a lazy compacting system can be significant.
In this work, we present Mapped SplinterDB, a key-value store that achieves excellent insertion performance, query performance, space efficiency, and scalability by replacing filters with maplets, space-efficient data structures that act as lossy maps with false positives. Critically, we use quotient maplets, which can be merged and resized without access to the underlying data, enabling us to decouple compaction of the data from compaction of the quotient maplets. Thus Mapped SplinterDB can compact data lazily and quotient maplets aggressively, so that each level has multiple sorted runs of data but only one quotient maplet. Quotient maplets are so small that compacting them aggressively is still cheaper than compacting the (much larger) data lazily, so overall we get the insertion performance of a lazily compacted system. And, since there is only one quotient maplet to query on each level, we get the query performance of an aggressively compacted system. Furthermore, quotient maplets can accelerate queries even when they don't fit in RAM, improving scalability to huge datasets. We also show how to use quotient maplets to estimate when a compaction could resolve a high density of updates, enabling Mapped SplinterDB to perform targeted compactions for space recovery.
In our benchmarks, Mapped SplinterDB matches the insertion performance of SplinterDB, a state-of-the-art lazily compacted system, and beats RocksDB, an aggressive compacting system, by up to 9×. On queries, Mapped SplinterDB outperforms SplinterDB and RocksDB by up to 89% and 83%, respectively, and scales gracefully to huge datasets. Mapped SplinterDB is able to dynamically trade update performance for space efficiency, resulting in space overheads on update-heavy workloads as low as 15-61%, whereas RocksDB had 80-117% and SplinterDB had up to 137% space overhead.

Supplemental Material

MP4 File
SIGMOD 2023 Presentation Video for "SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Policy"

References

[1]
Amazon. Amazon ec2 instance types. https://aws.amazon.com/ec2/instance-types/, September 2021.
[2]
Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. Designing access methods: The RUM conjecture. In Evaggelia Pitoura, Sofian Maabout, Georgia Koutrika, Amélie Marian, Letizia Tanca, Ioana Manolescu, and Kostas Stefanidis, editors, Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15--16, 2016, Bordeaux, France, March 15--16, 2016, pages 461--466. OpenProceedings.org, 2016.
[3]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Peter G. Harrison, Martin F. Arlitt, and Giuliano Casale, editors, ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '12, London, United Kingdom, June 11--15, 2012, pages 53--64. ACM, 2012.
[4]
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. TRIAD: creating synergies between memory, disk and log in log structured key-value stores. In Dilma Da Silva and Bryan Ford, editors, 2017 USENIX Annual Technical Conference, USENIX ATC 2017, Santa Clara, CA, USA, July 12--14, 2017, pages 363--375. USENIX Association, 2017.
[5]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. SILK: preventing latency spikes in log-structured merge key-value stores. In Dahlia Malkhi and Dan Tsafrir, editors, 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10--12, 2019, pages 753--766. USENIX Association, 2019.
[6]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. Don't thrash: How to cache your hash on flash. Proc. VLDB Endow., 5(11):1627--1637, 2012.
[7]
Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[8]
Dhruba Borthakur. Rocksdb github wiki -- performance benchmarks, 2013.
[9]
Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12--14, 2003, Baltimore, Maryland, USA, pages 546--554. ACM/SIAM, 2003.
[10]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking rocksdb key-value workloads at facebook. In Sam H. Noh and Brent Welch, editors, 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24--27, 2020, pages 209--223. USENIX Association, 2020.
[11]
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin J. Levandoski, James Hunter, and Mike Barnett. FASTER: A concurrent key-value store with in-place updates. In Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 275--290. ACM, 2018.
[12]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, 7th Symposium on Operating Systems Design and Implementation (OSDI '06), November 6--8, Seattle, WA, USA, pages 205--218. USENIX Association, 2006.
[13]
Yann Collet. xxhash - extremely fast hash algorithm.
[14]
Alexander Conway, Ainesh Bakshi, Yizheng Jiao, William Jannen, Yang Zhan, Jun Yuan, Michael A. Bender, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, and Martin Farach-Colton. File systems fated for senescence? nonsense, says science! In Geoff Kuenning and Carl A. Waldspurger, editors, 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017, pages 45--58. USENIX Association, 2017.
[15]
Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard P. Spillane, Amy Tai, and Rob Johnson. Splinterdb: Closing the bandwidth gap for nvme key-value stores. In Ada Gavrilovska and Erez Zadok, editors, 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020, pages 49--63. USENIX Association, 2020.
[16]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. Monkey: Optimal navigable key-value store. In Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 79--94. ACM, 2017.
[17]
Niv Dayan and Stratos Idreos. Dostoevsky: Better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 505--520. ACM, 2018.
[18]
Niv Dayan and Stratos Idreos. The log-structured merge-bush & the wacky continuum. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 449--466. ACM, 2019.
[19]
Niv Dayan and Moshe Twitto. Chucky: A succinct cuckoo filter for lsm-tree. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, pages 365--378. ACM, 2021.
[20]
Peter C. Dillinger and Stefan Walzer. Ribbon filter: practically smaller than bloom and xor. CoRR, abs/2103.02515, 2021.
[21]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Evolution of development priorities in key-value stores serving large-scale applications: The rocksdb experience. In 19th {USENIX} Conference on File and Storage Technologies ({FAST} 21), pages 33--49, 2021.
[22]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Evolution of development priorities in key-value stores serving large-scale applications: The rocksdb experience. In Marcos K. Aguilera and Gala Yadgar, editors, 19th USENIX Conference on File and Storage Technologies, FAST 2021, February 23--25, 2021, pages 33--49. USENIX Association, 2021.
[23]
Inc. FaceBook. A rocksdb storage engine with mysql. http://myrocks.io/, 2020.
[24]
Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher. Cuckoo filter: Practically better than bloom. In Aruna Seneviratne, Christophe Diot, Jim Kurose, Augustin Chaintreau, and Luigi Rizzo, editors, Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, CoNEXT 2014, Sydney, Australia, December 2--5, 2014, pages 75--88. ACM, 2014.
[25]
Philippe Flajolet, Danièle Gardy, and Loÿs Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, 39(3):207--229, 1992.
[26]
Apache Software Foundation. Apache Cassandra, 2019.
[27]
Inc. Google. Leveldb, 2019.
[28]
Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand S Aiyer, Liyin Tang, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Analysis of hdfs under hbase: a facebook messages case study. In FAST, volume 14, page 12th, 2014.
[29]
Yihe Huang, Matej Pavlovic, Virendra J. Marathe, Margo I. Seltzer, Tim Harris, and Steve Byan. Closing the performance gap between volatile and persistent key-value stores using cross-referencing logs. In Haryadi S. Gunawi and Benjamin Reed, editors, 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018, pages 967--979. USENIX Association, 2018.
[30]
John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17--19, 2012, pages 570--582. SIAM, 2012.
[31]
Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. Design continuums and the path toward self-designing key-value stores that know and learn. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org, 2019.
[32]
Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. Pink: High-speed in-storage key-value store with bounded tails. In Ada Gavrilovska and Erez Zadok, editors, 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020, pages 173--187. USENIX Association, 2020.
[33]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. Betrfs: A right-optimized write-optimized file system. In Jiri Schindler and Erez Zadok, editors, Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, Santa Clara, CA, USA, February 16--19, 2015, pages 301--315. USENIX Association, 2015.
[34]
Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, and Ronald G. Dreslinski. Improving performance of flash based key-value stores using storage class memory as a volatile memory extension. In Irina Calciu and Geoff Kuenning, editors, 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021, pages 821--837. USENIX Association, 2021.
[35]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. Kvell: the design and implementation of a fast persistent key-value store. In Tim Brecht and Carey Williamson, editors, Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27--30, 2019, pages 447--461. ACM, 2019.
[36]
Chen Luo. Breaking down memory walls in lsm-based storage systems. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, pages 2817--2819. ACM, 2020.
[37]
Chen Luo and Michael J. Carey. On performance stability in lsm-based storage systems. Proc. VLDB Endow., 13(4):449--462, 2019.
[38]
Chen Luo and Michael J. Carey. Lsm-based storage techniques: a survey. VLDB J., 29(1):393--418, 2020.
[39]
MongoDB. The database for modern applications. https://www.mongodb.com/, 2020.
[40]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996.
[41]
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 775--787. ACM, 2017.
[42]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In Ajay Gulati and Hakim Weatherspoon, editors, 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, pages 537--550. USENIX Association, 2016.
[43]
Felix Putze, Peter Sanders, and Johannes Singler. Cache-, hash-, and space-efficient bloom filters. ACM J. Exp. Algorithmics, 14, 2009.
[44]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017, pages 497--514. ACM, 2017.
[45]
Kai Ren and Garth A. Gibson. TABLEFS: enhancing metadata efficiency in the local file system. In Andrew Birrell and Emin Gün Sirer, editors, 2013 USENIX Annual Technical Conference, San Jose, CA, USA, June 26--28, 2013, pages 145--156. USENIX Association, 2013.
[46]
Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. Slimdb: A space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow., 10(13):2037--2048, 2017.
[47]
RocksDB. Issue #6448: read performance issue related to use_direct_reads.
[48]
Subhadeep Sarkar, Kaijie Chen, Zichen Zhu, and Manos Athanassoulis. Compactionary: A dictionary for LSM compactions. In Zachary G. Ives, Angela Bonifati, and Amr El Abbadi, editors, SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pages 2429--2432. ACM, 2022.
[49]
Subhadeep Sarkar, Dimitris Staratzis, Zichen Zhu, and Manos Athanassoulis. Constructing and analyzing the LSM compaction design space. Proc. VLDB Endow., 14(11):2216--2229, 2021.
[50]
Inc. ScyllaDB. Choose a compaction strategy. https://docs.scylladb.com/architecture/compaction/compaction-strategies/.
[51]
Tokutek, Inc. TokuDB, 2014. http://www.tokutek.com.
[52]
WiredTiger. Making big data roar. http://www.wiredtiger.com/, 2020.
[53]
Yugabyte. The leading high-performance distributed sql database. https://www.yugabyte.com/.
[54]
Yang Zhan, Alexander Conway, Yizheng Jiao, Nirjhar Mukherjee, Ian Groombridge, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. How to copy files. In Sam H. Noh and Brent Welch, editors, 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24--27, 2020, pages 75--89. USENIX Association, 2020.

Cited By

View all
  • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 30-Aug-2024
  • (2024)LSMGraph: A High-Performance Dynamic Graph Storage System with Multi-Level CSRProceedings of the ACM on Management of Data10.1145/36988182:6(1-28)Online publication date: 20-Dec-2024
  • Show More Cited By

Index Terms

  1. SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Policy

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 1
      PACMMOD
      May 2023
      2807 pages
      EISSN:2836-6573
      DOI:10.1145/3603164
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 May 2023
      Published in PACMMOD Volume 1, Issue 1

      Permissions

      Request permissions for this article.

      Author Tags

      1. LSM trees
      2. data structures
      3. filters
      4. key-value stores
      5. maplets
      6. quotient filters

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)433
      • Downloads (Last 6 weeks)90
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
      • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 30-Aug-2024
      • (2024)LSMGraph: A High-Performance Dynamic Graph Storage System with Multi-Level CSRProceedings of the ACM on Management of Data10.1145/36988182:6(1-28)Online publication date: 20-Dec-2024
      • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
      • (2024)GRF: A Global Range Filter for LSM-Trees with Shape EncodingProceedings of the ACM on Management of Data10.1145/36549442:3(1-27)Online publication date: 30-May-2024
      • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
      • (2024)Beyond Bloom: A Tutorial on Future Feature-Rich FiltersCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654681(636-644)Online publication date: 9-Jun-2024
      • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors & Microsystems10.1016/j.micpro.2023.104992105:COnline publication date: 25-Jun-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media