Abstract
Due to its low latency, byte-addressable, non-volatile, and high density, persistent memory (PM) is expected to be used to design a high-performance storage system. However, PM also has disadvantages such as limited endurance, thereby proposing challenges to traditional index technologies such as B+ tree. B+ tree is originally designed for dynamic random access memory (DRAM)-based or disk-based systems and has a large write amplification problem. The high write amplification is detrimental to a PM-based system. This paper proposes WO-tree, a write-optimized B+ tree for PM. WO-tree adopts an unordered write mechanism for the leaf nodes, and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes. When the leaf node is split, WO-tree performs the cache line flushing operation after all write operations are completed, which can reduce frequent data flushing operations. WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node. The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information, thereby significantly reducing the logging overhead. Furthermore, WO-tree adopts a lock-free search for inner nodes, which reduces the locking overhead for concurrency operation. We evaluate WO-tree using the Yahoo! Cloud Serving Benchmark (YCSB) workloads. Compared with traditional B+ tree, wB-tree, and Fast-Fair, the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%, 22.2%, and 30.8%, respectively, and the execution time is reduced by 84.3%, 27.3%, and 44.7%, respectively.
Similar content being viewed by others
References
Mueller W, Aichmayr G, Bergner W et al. Challenges for the DRAM cell scaling to 40nm. In Proc. IEEE International Electron Devices Meeting, December 2005, pp.336-339. https://doi.org/10.1109/IEDM.2005.1609344.
Mandelman A J, Dennard H R, Bronner B G et al. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development, 2002, 46(2.3): 187-212. https://doi.org/10.1147/rd.462.0187.
Freitas R, Wilcke W. Storage-class memory: The next storage system technology. IBM Journal of Research and Development, 2008, 52(4.5): 439-447. https://doi.org/10.1147/rd.524.0439.
Arulraj J, Pavlo A, Dulloor S. Let’s talk about storage & recovery methods for non-volatile memory database systems. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 31–June 4, 2015, pp.707-722. https://doi.org/10.1145/2723372.2749441.
Harter T, Borthakur D, Dong S et al. Analysis of HDFS under HBase: A facebook messages case study. In Proc. the 12th USENIX Conference on File and Storage Technologies, February 2014, pp.199-212.
Lepers B, Balmau O, Gupta K et al. KVell: The design and implementation of a fast persistent key-value store. In Proc. the 27th ACM Symposium on Operating Systems Principles, October 2019, pp.447-461. https://doi.org/10.1145/3341301.3359628.
Wang Y, Tan J, Mao R et al. Temperature-aware persistent data management for LSM-Tree on 3-D NAND ash memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12): 4611-4622. https://doi.org/10.1109/TCAD.2020.2982623.
Lu L, Pillai S T, Arpaci-Dusseau C A et al. WiscKey: Separating keys from values in SSD conscious storage. ACM Transactions on Storage, 2017, 13(1): 1-28. https://doi.org/10.1145/3033273.
Li Y, Chan H, Lee P et al. HashKV: Enabling efficient updates in KV storage via hashing. In Proc. the 2018 USENIX Annual Technical Conference, June 2018, pp.1007-1019. https://doi.org/10.5555/3277355.3277451.
Raju P, Kadekodi R, Chidambaram V et al. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proc. the 26th Symposium on Operating Systems Principle, October 2017, pp.497-514. https://doi.org/10.1145/3132747.3132765.
Chen S, Jin Q. Persistent B+-trees in non-volatile main memory. Proc. the VLDB Endowment, 2015, 8(7): 786-797. https://doi.org/10.14778/2752939.2752947.
Lee B, Ipek E, Mutlu O et al. Phase change memory architecture and the quest for scalability. Communications of the ACM, 2010, 53(7): 99-106. https://doi.org/10.1145/1785414.1785441.
Zhou P, Zhao B, Yan J et al. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Annual International Symposium on Computer Architecture, June 2009, pp.14-23. https://doi.org/10.1145/1555754.1555759.
Yu S. Resistive Random Access Memory (RRAM). Morgan & Claypool, 2016. https://doi.org/10.2200/S00681ED1V01Y201510EET006.
Apalkov D, Khvalkovskiy A, Watts S et al. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems, 2013, 9(2): Article No. 13. https://doi.org/10.1145/2463585.2463589.
Venkataraman S, Tolia N, Ranganathan P et al. Consistent and durable data structures for non-volatile byte-addressable memory. In Proc. the 9th USENIX Conference on File and Storage Technologies, February 2011, pp.61-75.
Yang J, Wei Q, Cheng C et al. NV-tree: Reducing consistency cost for NVM-based single level systems. In Proc. the 13th USENIX Conference on File and Storage Technologies, February 2015, pp.167-181. https://doi.org/10.5555/2750482.2750495.
Oukid I, Lasperas J, Nica A et al. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proc. the 2016 International Conference on Management of Data, June 26–July 1, 2016, pp.371-386. https://doi.org/10.1145/2882903.2915251.
Comer D. Ubiquitous B-tree. ACM Comput. Surv., 1979, 11(2): 121-137. https://doi.org/10.1145/356770.356776.
Bayer R. Binary B-trees for virtual memory. In Proc. the 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, November 1971, pp.219-235. https://doi.org/10.1145/1734714.1734731.
Ni J, Hu W, Li G et al. Bp-tree: A predictive B+-tree for reducing writes on phase change memory. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2368-2381. https://doi.org/10.1109/TKDE.2014.5.
Hwang D, Kim W H, Won Y et al. Endurable transient inconsistency in byte-addressable persistent B+-tree. In Proc. the 16th USENIX Conference on File and Storage Technologies, February 2018, pp.187-200. https://doi.org/10.5555/3189759.3189777.
Silberschatz A, Korth H, Sudarshan S. Database Systems Concepts (5th edition). McGraw-Hill, 2005.
Dulloo R S, Kumar S, Keshavamurthy A et al. System software for persistent memory. In Proc. the 9th European Conference on Computer Systems, April 2014, Article No. 15. https://doi.org/10.1145/2592798.2592814.
Volos1 H, Magalhaes G, Cherkasova L et al. Quartz: A lightweight performance emulator for persistent memory software. In Proc. the 16th Annual Middleware Conference, November 2015, pp.37-49. https://doi.org/10.1145/2814576.2814806.
Cooper B F, Silberstein A, Tam E et al. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, June 2010, pp.143-154. https://doi.org/10.1145/1807128.1807152.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 592 kb)
Rights and permissions
About this article
Cite this article
Ma, RX., Wu, F., Dong, BR. et al. Write-Optimized B+ Tree Index Technology for Persistent Memory. J. Comput. Sci. Technol. 36, 1037–1050 (2021). https://doi.org/10.1007/s11390-021-1247-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-021-1247-6