ABSTRACT
Key-value stores are everywhere. They power a diverse set of data-driven applications across both industry and science. Key-value stores are used as stand-alone NoSQL systems but they are also used as a part of more complex pipelines and systems such as machine learning and relational systems. In this tutorial, we survey the state-of-the-art approaches on how the core storage engine of a key-value store system is designed. We focus on several critical components of the engine, starting with the core data structures to lay out data across the memory hierarchy. We also discuss design issues related to caching, timestamps, concurrency control, updates, shifting workloads, as well as mixed workloads with both analytical and transactional characteristics. We cover designs that are read-optimized, write-optimized as well as hybrids. We draw examples from several state-of-the-art systems but we also put everything together in a general framework which allows us to model storage engine designs under a single unified model and reason about the expected behavior of diverse designs. In addition, we show that given the vast number of possible storage engine designs and their complexity, there is a need to be able to describe and communicate design decisions at a high level descriptive language and we present a first version of such a language. We then use that framework to present several open challenges in the field, especially in terms of supporting increasingly more diverse and dynamic applications in the era of data science and AI, including neural networks, graphs, and data versioning.
- Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. http://research.microsoft.com/pubs/63596/usenix-08-ssd.pdf http://dl.acm.org/citation.cfm?id=1404019Google ScholarDigital Library
- Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2016. ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys. IEEE Transactions on Computers (TC), Vol. 65, 3 (2016), 902--915. https://doi.org/10.1109/TC.2015.2435779Google ScholarDigital Library
- Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1009--1024. https://doi.org/10.1145/3035918.3064029Google ScholarDigital Library
- Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: A Hands-free Adaptive Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1103--1114. https://doi.org/10.1145/2588555.2610502Google ScholarDigital Library
- Victor Alvarez, Felix Martin Schuhknecht, Jens Dittrich, and Stefan Richter. 2014. Main Memory Adaptive Indexing for Multi-Core Systems. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 3:1---3:10. https://doi.org/10.1145/2619228.2619231Google ScholarDigital Library
- Michael R. Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang. 2013. Brainwash: A Data System for Feature Engineering. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://web.eecs.umich.edu/ mrander/pubs/mythical_man.pdf http://cidrdb.org/cidr2013/Papers/CIDR13_Paper82.pdfGoogle Scholar
- Apache. [n. d.]. Accumulo. https://accumulo.apache.org/ ([n. d.]).Google Scholar
- Apple. 2018. FoundationDB. https://github.com/apple/foundationdb (2018).Google Scholar
- Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1185--1196. https://doi.org/10.1145/2463676.2465296Google ScholarDigital Library
- Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2882903.2915231Google ScholarDigital Library
- Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT). 461--466. http://dx.doi.org/10.5441/002/edbt.2016.42Google Scholar
- Shivnath Babu, Nedyalko Borisov, Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala. 2009. Automated Experiment-Driven Management of (Database) Systems. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS). http://www.usenix.org/events/hotos09/tech/full_papers/babu/babu.pdfGoogle Scholar
- Rudolf Bayer and Edward M. McCreight. 1970. Organization and Maintenance of Large Ordered Indices. In Proceedings of the ACM SIGFIDET Workshop on Data Description and Access, Vol. 1. 107--141. https://doi.org/10.1007/BF00288683Google Scholar
- Yingyi Bu, Vinayak R. Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment, Vol. 8, 2 (2014), 161--172. https://doi.org/10.14778/2735471.2735477Google ScholarDigital Library
- Zhao Cao, Shimin Chen, Feifei Li, Min Wang, and Xiaoyang Sean Wang. 2013. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2013/Papers/CIDR13_Paper46.pdfGoogle Scholar
- Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin J Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 275--290. https://doi.org/10.1145/3183713.3196898Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). 205--218. http://dl.acm.org/citation.cfm?id=1267308.1267323Google Scholar
- Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W Lee, Ashish Motivala, Abdul Q Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 215--226. https://doi.org/10.1145/2882903.2903741Google ScholarDigital Library
- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 79--94. https://doi.org/10.1145/3035918.3064054Google ScholarDigital Library
- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS), Vol. 43, 4 (2018), 16:1--16:48.Google ScholarDigital Library
- Niv Dayan, Philippe Bonnet, and Stratos Idreos. 2016. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 327--342. https://doi.org/10.1145/2882903.2915219Google ScholarDigital Library
- Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 505--520. https://doi.org/10.1145/3183713.3196927Google ScholarDigital Library
- Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data .Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, Vol. 41, 6 (2007), 205--220. https://doi.org/10.1145/1323293.1294281Google ScholarDigital Library
- Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). 195--198.Google Scholar
- Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdfGoogle Scholar
- Facebook. [n. d.]. RocksDB. https://github.com/facebook/rocksdb ([n. d.]).Google Scholar
- Michael J Franklin. 1993. Caching and Memory Management in Client-Server Database Systems. Ph.D. Dissertation. University of Wisconsin-Madison.Google ScholarDigital Library
- Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys). 32:1--32:14. https://doi.org/10.1145/2741948.2741973Google ScholarDigital Library
- Google. [n. d.]. LevelDB. https://github.com/google/leveldb/ ([n. d.]).Google Scholar
- Goetz Graefe, Felix Halim, Stratos Idreos, Harumi Kuno, and Stefan Manegold. 2012. Concurrency control for adaptive indexing. Proceedings of the VLDB Endowment, Vol. 5, 7 (2012), 656--667. http://dl.acm.org/citation.cfm?id=2180918Google ScholarDigital Library
- Richard A Hankins and Jignesh M Patel. 2003. Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 417--428. http://www.vldb.org/conf/2003/papers/S13P03.pdfGoogle Scholar
- HBase. 2013. Online reference. http://hbase.apache.org/ (2013).Google Scholar
- Max Heimel, Martin Kiefer, and Volker Markl. 2015. Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1477--1492. https://doi.org/10.1145/2723372.2749438Google ScholarDigital Library
- Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In Biennial Conference on Innovative Data Systems Research (CIDR) .Google Scholar
- Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) .Google Scholar
- Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2009. Self-organizing Tuple Reconstruction in Column-Stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 297--308. https://doi.org/10.1145/1559845.1559878Google ScholarDigital Library
- Stratos Idreos, Lukas M Maas, and Mike S Kester. 2017. Evolutionary Data Systems. CoRR, Vol. abs/1706.0 (2017). arxiv: 1706.05714Google Scholar
- Stratos Idreos, Kostas Zoumpatianos, Manos Athanassoulis, Niv Dayan, Brian Hentschel, Michael S. Kester, Demi Guo, Lukas M. Maas, Wilson Qin, Abdul Wasay, and Yiyou Sun. 2018a. The Periodic Table of Data Structures. IEEE Data Engineering Bulletin, Vol. 41, 3 (2018), 64--75. http://sites.computer.org/debull/A18sept/p64.pdfGoogle Scholar
- Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S Kester, and Demi Guo. 2018b. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 535--550. https://doi.org/10.1145/3183713.3199671Google ScholarDigital Library
- Oliver Kennedy and Lukasz Ziarek. 2015. Just-In-Time Data Structures. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper9.pdfGoogle Scholar
- Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. VLDB, Vol. 11, 6 (2018), 677--690. http://www.vldb.org/pvldb/vol11/p677-kondylakis.pdfGoogle ScholarDigital Library
- Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2019. Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm. In SIGMOD .Google ScholarDigital Library
- Donald Kossman. 2018. Systems Research - Fueling Future Disruptions. In Keynote talk at the Microsoft Research Faculty Summit. Redmond, WA, USA. https://www.microsoft.com/en-us/research/video/systems-research-fueling-future-disruptions/Google Scholar
- Avinash Lakshman and Prashant Malik. 2010. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), 35--40. http://dl.acm.org/citation.cfm?id=1773912.1773922Google ScholarDigital Library
- Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/limGoogle Scholar
- LinkedIn. [n. d.]. Voldemort. http://www.project-voldemort.com ([n. d.]).Google Scholar
- Zezhou Liu and Stratos Idreos. 2016. Main Memory Adaptive Denormalization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2253--2254. https://doi.org/10.1145/2882903.2914835Google ScholarDigital Library
- Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 133--148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/luGoogle ScholarDigital Library
- Tim Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming .Addison-Wesley Professional.Google ScholarDigital Library
- Memcached. [n. d.]. Reference. http://memcached.org/ ( [n. d.]).Google Scholar
- MongoDB. [n. d.]. Online reference. http://www.mongodb.com/ ( [n. d.]).Google Scholar
- Michael A. Olson, Keith Bostic, and Margo I. Seltzer. 1999. Berkeley DB. In Proceedings of the USENIX Annual Technical Conference (ATC). 183--191. http://www.usenix.org/events/usenix99/olson.htmlGoogle ScholarDigital Library
- Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385. http://dl.acm.org/citation.cfm?id=230823.230826Google ScholarDigital Library
- Eleni Petraki, Stratos Idreos, and Stefan Manegold. 2015. Holistic Indexing in Main-memory Column-stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data .Google ScholarDigital Library
- Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold, and Martin L. Kersten. 2014. Database cracking: fancy scan, not poor man's sort!. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 1--8. https://doi.org/10.1145/2619228.2619232Google Scholar
- Redis. [n. d.]. Online reference. http://redis.io/ ([n. d.]).Google Scholar
- Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, Vol. 10, 13 (2017), 2037--2048. http://www.vldb.org/pvldb/vol10/p2037-ren.pdfGoogle ScholarDigital Library
- Stephen M. Rumble, Ankita Kejriwal, and John K. Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 1--16. https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumbleGoogle Scholar
- Felix Martin Schuhknecht, Alekh Jindal, and Jens Dittrich. 2013. The Uncracked Pieces in Database Cracking. Proceedings of the VLDB Endowment, Vol. 7, 2 (2013), 97--108. http://www.vldb.org/pvldb/vol7/p97-schuhknecht.pdfGoogle ScholarDigital Library
- Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 217--228. https://doi.org/10.1145/2213836.2213862Google ScholarDigital Library
- Justin Sheehy and David Smith. 2010. Bitcask: A Log-Structured Hash Table for Fast Key/Value Data. Basho White Paper (2010).Google Scholar
- Daniel Dominic Sleator and Robert Endre Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM, Vol. 32, 3 (1985), 652--686. https://doi.org/10.1145/3828.3835Google ScholarDigital Library
- Spotify. 2014. Sparkey. https://github.com/spotify/sparkey (2014).Google Scholar
- Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 2--11. https://doi.org/10.1109/ICDE.2005.1Google ScholarDigital Library
- WiredTiger. [n. d.]. Source Code. https://github.com/wiredtiger/wiredtiger ([n. d.]).Google Scholar
- Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 323--336. https://doi.org/10.1145/3183713.3196931Google ScholarDigital Library
- Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for interactive exploration of big data series. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1555--1566. https://doi.org/10.1145/2588555.2610498Google ScholarDigital Library
Index Terms
- Key-Value Storage Engines
Recommendations
An Efficient Memory-Mapped Key-Value Store for Flash Storage
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingPersistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data ...
Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines
PACMMODWe present Limousine, a self-designing key-value storage engine, that can automatically morph to the near-optimal storage engine architecture shape given a workload, a cloud budget, and target performance. At its core, Limousine identifies the ...
MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and Querying
PACMMODLSM-based key-value stores have been leveraged in many state-of-the-art data-intensive applications as storage engines. As data volume scales up, a cost-efficient approach is to deploy these applications on hybrid cloud storage with hot/cold separation, ...
Comments