skip to main content
10.1145/3318464.3383133acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Key-Value Storage Engines

Published:31 May 2020Publication History

ABSTRACT

Key-value stores are everywhere. They power a diverse set of data-driven applications across both industry and science. Key-value stores are used as stand-alone NoSQL systems but they are also used as a part of more complex pipelines and systems such as machine learning and relational systems. In this tutorial, we survey the state-of-the-art approaches on how the core storage engine of a key-value store system is designed. We focus on several critical components of the engine, starting with the core data structures to lay out data across the memory hierarchy. We also discuss design issues related to caching, timestamps, concurrency control, updates, shifting workloads, as well as mixed workloads with both analytical and transactional characteristics. We cover designs that are read-optimized, write-optimized as well as hybrids. We draw examples from several state-of-the-art systems but we also put everything together in a general framework which allows us to model storage engine designs under a single unified model and reason about the expected behavior of diverse designs. In addition, we show that given the vast number of possible storage engine designs and their complexity, there is a need to be able to describe and communicate design decisions at a high level descriptive language and we present a first version of such a language. We then use that framework to present several open challenges in the field, especially in terms of supporting increasingly more diverse and dynamic applications in the era of data science and AI, including neural networks, graphs, and data versioning.

References

  1. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. http://research.microsoft.com/pubs/63596/usenix-08-ssd.pdf http://dl.acm.org/citation.cfm?id=1404019Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2016. ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys. IEEE Transactions on Computers (TC), Vol. 65, 3 (2016), 902--915. https://doi.org/10.1109/TC.2015.2435779Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1009--1024. https://doi.org/10.1145/3035918.3064029Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: A Hands-free Adaptive Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1103--1114. https://doi.org/10.1145/2588555.2610502Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Victor Alvarez, Felix Martin Schuhknecht, Jens Dittrich, and Stefan Richter. 2014. Main Memory Adaptive Indexing for Multi-Core Systems. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 3:1---3:10. https://doi.org/10.1145/2619228.2619231Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael R. Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang. 2013. Brainwash: A Data System for Feature Engineering. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://web.eecs.umich.edu/ mrander/pubs/mythical_man.pdf http://cidrdb.org/cidr2013/Papers/CIDR13_Paper82.pdfGoogle ScholarGoogle Scholar
  7. Apache. [n. d.]. Accumulo. https://accumulo.apache.org/ ([n. d.]).Google ScholarGoogle Scholar
  8. Apple. 2018. FoundationDB. https://github.com/apple/foundationdb (2018).Google ScholarGoogle Scholar
  9. Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1185--1196. https://doi.org/10.1145/2463676.2465296Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2882903.2915231Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT). 461--466. http://dx.doi.org/10.5441/002/edbt.2016.42Google ScholarGoogle Scholar
  12. Shivnath Babu, Nedyalko Borisov, Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala. 2009. Automated Experiment-Driven Management of (Database) Systems. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS). http://www.usenix.org/events/hotos09/tech/full_papers/babu/babu.pdfGoogle ScholarGoogle Scholar
  13. Rudolf Bayer and Edward M. McCreight. 1970. Organization and Maintenance of Large Ordered Indices. In Proceedings of the ACM SIGFIDET Workshop on Data Description and Access, Vol. 1. 107--141. https://doi.org/10.1007/BF00288683Google ScholarGoogle Scholar
  14. Yingyi Bu, Vinayak R. Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment, Vol. 8, 2 (2014), 161--172. https://doi.org/10.14778/2735471.2735477Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhao Cao, Shimin Chen, Feifei Li, Min Wang, and Xiaoyang Sean Wang. 2013. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2013/Papers/CIDR13_Paper46.pdfGoogle ScholarGoogle Scholar
  16. Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin J Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 275--290. https://doi.org/10.1145/3183713.3196898Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). 205--218. http://dl.acm.org/citation.cfm?id=1267308.1267323Google ScholarGoogle Scholar
  18. Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W Lee, Ashish Motivala, Abdul Q Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 215--226. https://doi.org/10.1145/2882903.2903741Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 79--94. https://doi.org/10.1145/3035918.3064054Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS), Vol. 43, 4 (2018), 16:1--16:48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Niv Dayan, Philippe Bonnet, and Stratos Idreos. 2016. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 327--342. https://doi.org/10.1145/2882903.2915219Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 505--520. https://doi.org/10.1145/3183713.3196927Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data .Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, Vol. 41, 6 (2007), 205--220. https://doi.org/10.1145/1323293.1294281Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). 195--198.Google ScholarGoogle Scholar
  26. Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdfGoogle ScholarGoogle Scholar
  27. Facebook. [n. d.]. RocksDB. https://github.com/facebook/rocksdb ([n. d.]).Google ScholarGoogle Scholar
  28. Michael J Franklin. 1993. Caching and Memory Management in Client-Server Database Systems. Ph.D. Dissertation. University of Wisconsin-Madison.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys). 32:1--32:14. https://doi.org/10.1145/2741948.2741973Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Google. [n. d.]. LevelDB. https://github.com/google/leveldb/ ([n. d.]).Google ScholarGoogle Scholar
  31. Goetz Graefe, Felix Halim, Stratos Idreos, Harumi Kuno, and Stefan Manegold. 2012. Concurrency control for adaptive indexing. Proceedings of the VLDB Endowment, Vol. 5, 7 (2012), 656--667. http://dl.acm.org/citation.cfm?id=2180918Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Richard A Hankins and Jignesh M Patel. 2003. Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 417--428. http://www.vldb.org/conf/2003/papers/S13P03.pdfGoogle ScholarGoogle Scholar
  33. HBase. 2013. Online reference. http://hbase.apache.org/ (2013).Google ScholarGoogle Scholar
  34. Max Heimel, Martin Kiefer, and Volker Markl. 2015. Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1477--1492. https://doi.org/10.1145/2723372.2749438Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In Biennial Conference on Innovative Data Systems Research (CIDR) .Google ScholarGoogle Scholar
  36. Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) .Google ScholarGoogle Scholar
  37. Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2009. Self-organizing Tuple Reconstruction in Column-Stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 297--308. https://doi.org/10.1145/1559845.1559878Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stratos Idreos, Lukas M Maas, and Mike S Kester. 2017. Evolutionary Data Systems. CoRR, Vol. abs/1706.0 (2017). arxiv: 1706.05714Google ScholarGoogle Scholar
  39. Stratos Idreos, Kostas Zoumpatianos, Manos Athanassoulis, Niv Dayan, Brian Hentschel, Michael S. Kester, Demi Guo, Lukas M. Maas, Wilson Qin, Abdul Wasay, and Yiyou Sun. 2018a. The Periodic Table of Data Structures. IEEE Data Engineering Bulletin, Vol. 41, 3 (2018), 64--75. http://sites.computer.org/debull/A18sept/p64.pdfGoogle ScholarGoogle Scholar
  40. Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S Kester, and Demi Guo. 2018b. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 535--550. https://doi.org/10.1145/3183713.3199671Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Oliver Kennedy and Lukasz Ziarek. 2015. Just-In-Time Data Structures. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper9.pdfGoogle ScholarGoogle Scholar
  42. Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. VLDB, Vol. 11, 6 (2018), 677--690. http://www.vldb.org/pvldb/vol11/p677-kondylakis.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  43. Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2019. Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm. In SIGMOD .Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Donald Kossman. 2018. Systems Research - Fueling Future Disruptions. In Keynote talk at the Microsoft Research Faculty Summit. Redmond, WA, USA. https://www.microsoft.com/en-us/research/video/systems-research-fueling-future-disruptions/Google ScholarGoogle Scholar
  45. Avinash Lakshman and Prashant Malik. 2010. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), 35--40. http://dl.acm.org/citation.cfm?id=1773912.1773922Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/limGoogle ScholarGoogle Scholar
  47. LinkedIn. [n. d.]. Voldemort. http://www.project-voldemort.com ([n. d.]).Google ScholarGoogle Scholar
  48. Zezhou Liu and Stratos Idreos. 2016. Main Memory Adaptive Denormalization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2253--2254. https://doi.org/10.1145/2882903.2914835Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 133--148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/luGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tim Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming .Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Memcached. [n. d.]. Reference. http://memcached.org/ ( [n. d.]).Google ScholarGoogle Scholar
  52. MongoDB. [n. d.]. Online reference. http://www.mongodb.com/ ( [n. d.]).Google ScholarGoogle Scholar
  53. Michael A. Olson, Keith Bostic, and Margo I. Seltzer. 1999. Berkeley DB. In Proceedings of the USENIX Annual Technical Conference (ATC). 183--191. http://www.usenix.org/events/usenix99/olson.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  54. Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385. http://dl.acm.org/citation.cfm?id=230823.230826Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Eleni Petraki, Stratos Idreos, and Stefan Manegold. 2015. Holistic Indexing in Main-memory Column-stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data .Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold, and Martin L. Kersten. 2014. Database cracking: fancy scan, not poor man's sort!. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 1--8. https://doi.org/10.1145/2619228.2619232Google ScholarGoogle Scholar
  57. Redis. [n. d.]. Online reference. http://redis.io/ ([n. d.]).Google ScholarGoogle Scholar
  58. Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, Vol. 10, 13 (2017), 2037--2048. http://www.vldb.org/pvldb/vol10/p2037-ren.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  59. Stephen M. Rumble, Ankita Kejriwal, and John K. Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 1--16. https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumbleGoogle ScholarGoogle Scholar
  60. Felix Martin Schuhknecht, Alekh Jindal, and Jens Dittrich. 2013. The Uncracked Pieces in Database Cracking. Proceedings of the VLDB Endowment, Vol. 7, 2 (2013), 97--108. http://www.vldb.org/pvldb/vol7/p97-schuhknecht.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  61. Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 217--228. https://doi.org/10.1145/2213836.2213862Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Justin Sheehy and David Smith. 2010. Bitcask: A Log-Structured Hash Table for Fast Key/Value Data. Basho White Paper (2010).Google ScholarGoogle Scholar
  63. Daniel Dominic Sleator and Robert Endre Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM, Vol. 32, 3 (1985), 652--686. https://doi.org/10.1145/3828.3835Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Spotify. 2014. Sparkey. https://github.com/spotify/sparkey (2014).Google ScholarGoogle Scholar
  65. Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 2--11. https://doi.org/10.1109/ICDE.2005.1Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. WiredTiger. [n. d.]. Source Code. https://github.com/wiredtiger/wiredtiger ([n. d.]).Google ScholarGoogle Scholar
  67. Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 323--336. https://doi.org/10.1145/3183713.3196931Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for interactive exploration of big data series. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1555--1566. https://doi.org/10.1145/2588555.2610498Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Key-Value Storage Engines

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
          June 2020
          2925 pages
          ISBN:9781450367356
          DOI:10.1145/3318464

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 May 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader