skip to main content
10.1145/3373376.3378515acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory

Published: 13 March 2020 Publication History

Abstract

Emerging hardware like persistent memory (PM) and high-speed NICs are promising to build efficient key-value stores. However, we observe that the small-sized access pattern in key-value stores doesn't match with the persistence granularity in PMs, leaving the PM bandwidth underutilized. This paper proposes an efficient PM-based key-value storage engine named FlatStore. Specifically, it decouples the role of a KV store into a persistent log structure for efficient storage and a volatile index for fast indexing. Upon it, FlatStore further incorporates two techniques: 1) compacted log format to maximize the batching opportunity in the log; 2) pipelined horizontal batching to steal log entries from other cores when creating a batch, thus delivering low-latency and high-throughput performance. We implement FlatStore with the volatile index of both a hash table and Masstree. We deploy FlatStore on Optane DC Persistent Memory, and our experiments show that FlatStore achieves up to 35 Mops/s with a single server node, 2.5 - 6.3 times faster than existing systems.

References

[1]
Amr Ahmed, Moahmed Aly, Joseph Gonzalez, Shravan Narayanamurthy, and Alexander J. Smola. 2012. Scalable Inference in Latent Variable Models. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12). ACM, New York, NY, USA, 123--132. https://doi.org/10.1145/2124295.2124312
[2]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, Vol. 40. ACM, 53--64.
[3]
IG Baek, MS Lee, S Seo, MJ Lee, DH Seo, D-S Suh, JC Park, SO Park, HS Kim, IK Yoo, et almbox. 2004. Highly scalable nonvolatile resistive memory using simple binary oxide driven by asymmetric unipolar voltage pulses. In Electron Devices Meeting, 2004. IEDM Technical Digest. IEEE International. IEEE, 587--590.
[4]
Emery D Berger, Kathryn S McKinley, Robert D Blumofe, and Paul R Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In ACM SIGARCH Computer Architecture News, Vol. 28. ACM, 117--128.
[5]
Kumud Bhandari, Dhruva R Chakrabarti, and Hans-J Boehm. 2016. Makalu: Fast recoverable allocation of non-volatile memory. In ACM SIGPLAN Notices, Vol. 51. ACM, 677--694.
[6]
Silas Boyd-Wickizer, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2014. OpLog: a library for scaling update-heavy data structures. (2014).
[7]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull., Vol. 38 (2015), 28--38.
[8]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), Vol. 26, 2 (2008), 4.
[9]
Shimin Chen and Qin Jin. 2015. Persistent b
[10]
-trees in non-volatile main memory. Proceedings of the VLDB Endowment, Vol. 8, 7 (2015), 786--797.
[11]
Youmin Chen, Youyou Lu, Pei Chen, and Jiwu Shu. 2018a. Efficient and Consistent NVMM Cache for SSD-based File System. IEEE Trans. Comput. (2018).
[12]
Youmin Chen, Youyou Lu, and Jiwu Shu. 2019. Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, New York, NY, USA, Article 19, bibinfonumpages14 pages. https://doi.org/10.1145/3302424.3303968
[13]
Youmin Chen, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018b. HiNFS: A Persistent Memory File System with Both Buffering and Direct-Access. ACM Trans. Storage, Vol. 14, 1, Article 4 (April 2018), bibinfonumpages30 pages. https://doi.org/10.1145/3204454
[14]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC). ACM, New York, NY, USA, 143--154.
[15]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, 1 (2008), 107--113.
[16]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. In ACM SIGOPS operating systems review, Vol. 41. ACM, 205--220.
[17]
Diego Didona and Willy Zwaenepoel. 2019. Size-aware Sharding For Improving Tail Latencies in In-memory Key-value Stores. In NSDI . 79--94.
[18]
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). 401--414.
[19]
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys). ACM, New York, NY, USA, Article 15, bibinfonumpages15 pages.
[20]
Qingda Hu, Jinglei Ren, Anirudh Badam, and Thomas Moscibroda. 2017. Log-structured non-volatile main memory. In Proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC) .
[21]
Yihe Huang, Matej Pavlovic, Virendra Marathe, Margo Seltzer, Tim Harris, and Steve Byan. 2018. Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, 967--979.
[22]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte-addressable Persistent B
[23]
-tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST'18). USENIX Association, Berkeley, CA, USA, 187--200. http://dl.acm.org/citation.cfm?id=3189759.3189777
[24]
Silicon Graphics Inc. [n.d.]. XFS User Guide: A guide for XFS filesystem users and administrators. http://xfs.org/index.php/XFS_Papers_and_Documentation .
[25]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et almbox. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[26]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be general and fast. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 1--16.
[27]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2015. Using RDMA efficiently for key-value services. ACM SIGCOMM Computer Communication Review, Vol. 44, 4 (2015), 295--306.
[28]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016a. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16) .
[29]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016b. FaSST: fast, scalable and simple distributed transactions with two-sided RDMA datagram RPCs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, 185--201.
[30]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18). USENIX Association, Berkeley, CA, USA, 427--444. http://dl.acm.org/citation.cfm?id=3291168.3291200
[31]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th annual International Symposium on Computer Architecture (ISCA) . ACM, New York, NY, USA, 2--13.
[32]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A New File System for Flash Storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST). USENIX, Santa Clara, CA. https://www.usenix.org/conference/fast15/technical-sessions/presentation/lee
[33]
Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 137--152. https://doi.org/10.1145/3132747.3132756
[34]
Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In OSDI, Vol. 14. 583--598.
[35]
Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an RDMA-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, 773--785.
[36]
Teng Ma, Mingxing Zhang, Kang Chen, Xuehai Qian, Zhuo Song, and Yongwei Wu. 2020. AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture. In the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM.
[37]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 183--196.
[38]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In Presented as part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13) . 103--114.
[39]
Christopher Mitchell Kate Montgomery, Lamont Nelson, Siddhartha Sen, and Jinyang Li. 2016. Balancing CPU and network in the cell distributed B-Tree store. In 2016 USENIX Annual Technical Conference. 451.
[40]
Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H Noh, and Beomseok Nam. 2019. Write-optimized dynamic hashing for persistent memory. In 17th USENIX Conference on File and Storage Technologies (FAST 19). 31--44.
[41]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, et almbox. 2013. Scaling Memcache at Facebook. In nsdi, Vol. 13. 385--398.
[42]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385.
[43]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. ACM, 371--386.
[44]
John K. Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen M. Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud Storage System. ACM Trans. Comput. Syst., Vol. 33 (2015), 7:1--7:55.
[45]
Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, and John Ousterhout. 2018. Arachne: Core-aware Thread Management. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18). USENIX Association, Berkeley, CA, USA, 145--160. http://dl.acm.org/citation.cfm?id=3291168.3291180
[46]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th annual International Symposium on Computer Architecture (ISCA) . ACM, New York, NY, USA, 24--33.
[47]
Drew S Roselli, Jacob R Lorch, Thomas E Anderson, et almbox. 2000. A Comparison of File System Workloads. In Proceedings of 2000 USENIX Annual Technical Conference. USENIX, Berkeley, CA, 41--54.
[48]
Mendel Rosenblum and John K Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS), Vol. 10, 1 (1992), 26--52.
[49]
Stephen M Rumble, Ankita Kejriwal, and John K Ousterhout. 2014. Log-structured memory for DRAM-based storage. In FAST, Vol. 14. 1--16.
[50]
Stephen M Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K Ousterhout. 2011. It's Time for Low Latency. In HotOS, Vol. 13. 11--11.
[51]
Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing. ACM, 323--337.
[52]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010), 1--10.
[53]
Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, Roy H Campbell, et almbox. 2011. Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 61--75.
[54]
Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 22.
[55]
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. Hikv: A hybrid index key-value store for dram-nvm memory systems. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 349--362.
[56]
Jian Xu and Steven Swanson. 2016. NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In 14th USENIX Conference on File and Storage Technologies (FAST 16) . 323--338.
[57]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In FAST, Vol. 15. 167--181.
[58]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud, Vol. 10, 10--10 (2010), 95.
[59]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th annual International Symposium on Computer Architecture (ISCA) . ACM, New York, NY, USA, 14--23.
[60]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and High-performance Hashing Index Scheme for Persistent Memory. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18). USENIX Association, Berkeley, CA, USA, 461--476. http://dl.acm.org/citation.cfm?id=3291168.3291202

Cited By

View all
  • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
  • (2025)AnyKey: A Key-Value SSD for All Workload TypesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707279(47-63)Online publication date: 3-Feb-2025
  • (2025)NStore: A High-Performance NUMA-Aware Key-Value Store for Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.350426974:3(929-943)Online publication date: Mar-2025
  • Show More Cited By

Index Terms

  1. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2020
    1412 pages
    ISBN:9781450371025
    DOI:10.1145/3373376
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. batching
    2. key-value store
    3. log structure
    4. persistent memory

    Qualifiers

    • Research-article

    Funding Sources

    • National Key Research & Development Program of China
    • National Natural Science Foundation of China
    • Re- search and Development Plan in Key field of Guangdong Province
    • Project of ZTE

    Conference

    ASPLOS '20

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)223
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
    • (2025)AnyKey: A Key-Value SSD for All Workload TypesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707279(47-63)Online publication date: 3-Feb-2025
    • (2025)NStore: A High-Performance NUMA-Aware Key-Value Store for Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.350426974:3(929-943)Online publication date: Mar-2025
    • (2025)Scaling Persistent In-Memory Key-Value Stores Over Modern Tiered, Heterogeneous Memory HierarchiesIEEE Transactions on Computers10.1109/TC.2024.350035274:2(495-509)Online publication date: Feb-2025
    • (2025)HR-Tree: A Hybrid PMem-DRAM and Write-Optimized R-Tree for Spatial Data StorageComputing and Combinatorics10.1007/978-981-96-1093-8_8(92-103)Online publication date: 20-Feb-2025
    • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 1-Jan-2024
    • (2024)An enterprise composite blockchain construction method for business environmentPLOS ONE10.1371/journal.pone.029916219:3(e0299162)Online publication date: 1-Mar-2024
    • (2024)Structured storage for ubiquitous operating systemsSCIENTIA SINICA Informationis10.1360/SSI-2022-041554:3(461)Online publication date: 12-Mar-2024
    • (2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
    • (2024)PMAlloc: A Holistic Approach to Improving Persistent Memory AllocationACM Transactions on Computer Systems10.1145/364388642:3-4(1-52)Online publication date: 20-Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media