Scalable NUMA-aware persistent B+-tree for non-volatile memory devices

Jamil, Safdar; Salam, Abdul; Khan, Awais; Burgstaller, Bernd; Park, Sung-Soon; Kim, Youngjae

doi:10.1007/s10586-022-03766-1

Scalable NUMA-aware persistent B⁺-tree for non-volatile memory devices

Published: 17 November 2022

Volume 26, pages 2865–2881, (2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Safdar Jamil¹,
Abdul Salam¹,
Awais Khan²,
Bernd Burgstaller³,
Sung-Soon Park⁴ &
…
Youngjae Kim¹

401 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 17 October 2023

This article has been updated

Abstract

Emerging manycore servers with Intel DC persistent memory (DCPM) are equipped with hundreds of CPU cores on multiple CPU sockets. Such servers are designed to guarantee high performance and scalability. Several recent studies proposed persistent fault-tolerant indexes for DCPM. Fast & Fair (F&F) is the state-of-the-art concurrent variant of the B⁺-tree for DCPM. However, its adoption on manycore servers is hampered by scalability limitations due to lengthy, lock-based synchronization including structure modification operations. The lack of NUMA awareness induces further performance overhead from remote memory accesses. In this paper, we propose F³-tree, a concurrent, NUMA-aware and persistent future-based B⁺-tree for DCPM servers. F\(^3\)-tree relies on per-thread local future objects and a global B⁺-tree. To introduce NUMA awareness and minimize remote memory accesses, F\(^3\)-tree adopts per-socket dedicated asynchronous evaluation threads to checkpoint future objects to the global B⁺-tree. F\(^3\)-tree employs an in-memory hash table to mitigate the read overhead of key searches over the future objects. We implemented F\(^3\)-tree atop F&F and evaluated its performance on Linux using both synthetic and realistic workloads. Our evaluation shows that F\(^3\)-tree outperforms F&F on average by 3.4\(\times\) and 5\(\times\) without and with NUMA awareness, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Write-Optimized B+ Tree Index Technology for Persistent Memory

Article 30 September 2021

Revisiting Persistent Indexing Structures on Intel Optane DC Persistent Memory

Article 30 January 2021

Understanding and analysis of B+ trees on NVM towards consistency and efficiency

Article 16 March 2020

Data availibility

None.

Change history

27 October 2023
The original online version of this article was revised: The email address of the authors Awais Khan and Bernd Burgstaller has been corrected.
17 October 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10586-023-04176-7

Notes

We refer to multi-socket manycore machines as manycore machines hereafter.
For simplicity, from here onward we refer to thread-local doubly-linked lists of FOs as PTFOs.

References

Jamil, S., Khan, A., Burgstaller, B., Kim, Y.: Towards scalable manycore-aware persistent B+- trees for efficient indexing in cloud environments. In: Proceedings of the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pp. 44–49 (2021)
Khan, A., Sim, H., Vazhkudai, S. S., Ma, J., Oh, M.-H., Kim, Y.: Persistent memory object storage and indexing for scientific computing. In: Proceedings of the 2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), pp. 1–9 (2020)
Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020)
Kim, J.-H., Kim, Y., Jamil, S., Park, S.: A NUMA-aware NVM file system design for manycore server applications. In: Proceedings of the 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 1–5 (2020)
Kim, T., Khan, A., Kim, Y., Kasu, P., Atchley, S.: NUMA-aware thread scheduling for big data transfers over terabits network infrastructure. Sci. Program. 2018 4120561 (2018)
Kim, J., Kim, Y., Khan, A., Park, S.: Understanding the performance of storage class memory file systems in the NUMA architecture. Clust. Comput. 22(2), 347–360 (2019)
Article Google Scholar
Wang, Q., Lu, Y., Li, J., Shu, J.: Nap: a black-box approach to NUMA-aware persistent memory indexes. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 93–111. USENIX Association, Berkeley (2021)
Khan, A., Lee, C.-G., Hamandawana, P., Park, S., Kim, Y.: A robust fault-tolerant and scalable cluster-wide deduplication for shared-nothing storage systems. In: Proceedings of the 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 87–93 (2018)
Chen, S., Jin, Q.: Persistent B+-trees in non-volatile main memory. Proc. VLDB Endow. 8, 786–797 (2015)
Article Google Scholar
Oukid, I., Lasperas, J., Nica, A.,Willhalm, T.,Lehner, W.: FPTree: a hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16 (New York, NY, USA), pp. 371–386. Association for Computing Machinery, New York (2016)
Hwang, D., Kim, W.-H., Won, Y., Nam, B.: Endurable transient inconsistency in byte-addressable persistent B+-tree. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies, FAST’18, pp. 187–200 (2018)
Khan, A., Sim, H., Vazhkudai, S.S., Kim, Y.: MOSIQS: Persistent memory object storage with metadata indexing and querying for scientific computing. IEEE Access 9, 85217–85231 (2021)
Article Google Scholar
Yang, J.,Wei, Q., Chen, C., Wang, C., Yong, K. L., He, B.: NV-Tree: reducing consistency cost for NVM-based single level systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 167–181 (2015)
Scott, M.L.: Shared-Memory Synchronization. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Francisco (2013)
Dice, D., Marathe, V. J., Shavit, N.: Flat-combining NUMA locks. In: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pp. 65–74. Association for Computing Machinery, New York (2011)
Chabbi, M., Fagan, M., Mellor-Crummey, J.: High performance locks for multi-level NUMA systems. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 215–226. Association for Computing Machinery, New York (2015)
Kogan, A., Herlihy, M.: The future(s) of shared data structures. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC ’14, pp. 30–39 (2014)
Calciu, I., Sen, S.,Balakrishnan, M., Aguilera, M.K.: Black-box concurrent data structures for NUMA architectures. In: Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17 (New York, NY, USA), pp. 207–221. Association for Computing Machinery, New York (2017)
Lehman, P.L., Yao, S.B.: Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 650–670 (1981)
Article MATH Google Scholar
Yi, Z.,Yao, Y., Chen, K.: A universal construction to implement concurrent data structure for NUMA-multicore. In: Proceedings of the 50th International Conference on Parallel Processing (New York, NY, USA), Association for Computing Machinery, New York (2021)
Calciu, I., Gottschlich, J., Herlihy, M.: Using elimination and delegation to implement a scalable NUMA-friendly stack. In: Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism (HotPar 13) (San Jose, CA), USENIX Association, Berkeley (2013)
Bhardwaj, A., Kulkarni, C., Achermann, R., Calciu, I., Kashyap, S., Stutsman, R., Tai, A., Zellweger, G.: NrOS: effective replication and sharing in an operating system. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 295–312. USENIX Association, Berkeley (2021)
Lee, S. K., Mohan, J., Kashyap, S., Kim, T., Chidambaram,V.: Recipe: converting concurrent DRAM indexes to persistent-memory indexes. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, pp. 462–477 (2019)
Numactl. https://linux.die.net/man/8/numactl. Accessed: 2021-04-06
Defining the future of in-memory database computing. https://pmem.io/vmem/libvmmalloc/. Accessed 1 Dec 2021
Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco (2012)
Google Scholar
Ramalhete, P., Correia, A.: Brief announcement: hazard eras—non-blocking memory reclamation. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’17, pp. 367–369. Association for Computing Machinery, New York (2017)
Cohen, N., Petrank, E.: Efficient memory management for lock-free data structures with optimistic access. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pp. 254–263. Association for Computing Machinery, New York (2015)
Fast &fair B⁺-tree. https://github.com/DICL/FAST_FAIR. Accessed 07 Feb 2022
Yahoo cloud serving benchmark. https://github.com/brianfrankcooper/YCSB/. Accessed 21 Jan 2022
Lu, M., Zhixi Fang, J.: A solution of the cache ping-pong problem in multiprocessor systems. J. Parallel Distrib. Comput. 16(2), 158–171 (1992)
Article MATH Google Scholar
Xia, F., Jiang, D., Xiong, J., Sun, N.: HiKV: a hybrid index key-value store for DRAM-NVM memory systems. In: Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC 17) (Santa Clara, CA), pp. 349–362. USENIX Association, Berkeley (2017)
Venkataraman, S., Tolia, N., Ranganathan, P., Campbell, R. H.: Consistent and durable data structures for non-volatile byte-addressable memory. In: Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST’11, p. 5 (2011)
Liu, M., Xing, J., Chen, K., Wu, Y.: Building scalable NVM-based B+tree with HTM. In: Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019 (New York, NY, USA). Association for Computing Machinery, New York (2019)
Yang, J., Wei, Q., Chen, C., Wang, C., Yong, K. L., He, B.: NV-Tree: Reducing consistency cost for NVM-based single level systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15) (Santa Clara, CA), pp. 167–181. USENIX Association, Berkeley (2015)
Zuo, P., Hua, Y.: A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Trans. Parallel Distrib. Syst. 29(5), 985–998 (2017)
Article Google Scholar
Lee, S.K., Lim, K.H., Song, H., Nam, B., Noh, S.H.: WORT: Write optimal radix tree for persistent memory storage systems. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 17) (Santa Clara, CA), pp. 257–270. USENIX Association, Berkeley (2017)
Zhou, X., Shou, L., Chen, K., Hu, W., Chen, G.: DPTree: Differential indexing for persistent memory. Proc. VLDB Endow. 13, 421–434 (2019)
Article Google Scholar
Lee, C.-G., Noh, S., Kang, H., Hwang, S., Kim, Y.: Concurrent file metadata structure using readers-writer lock. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing (New York, NY, USA), pp. 1172–1181. Association for Computing Machinery, New York (2021)
Kim, J.-H., Kim, Y., Jamil, S., Lee, C.-G., Park, S.: Parallelizing shared file I/O operations of NVM file system for manycore servers. IEEE Access 9, 24570–24585 (2021)
Article Google Scholar
Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the Intel Optane byte-addressable NVM. In: Proceedings of the International Symposium on Memory Systems, MEMSYS ’19 (New York, NY, USA), pp. 304–315. Association for Computing Machinery, New York (2019)
Daase, B., Bollmeier, L.J., Benson, L., Rabl, T.: Maximizing persistent memory bandwidth utilization for OLAP workloads. In: Proceedings of the 2021 International Conference on Management of Data, SIGMOD/PODS ’21 (New York, NY, USA), pp. 339–351. Association for Computing Machinery, New York (2021)
Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid Volatile/Non-volatile main memories. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST 16) (Santa Clara, CA), pp. 323–338. USENIX Association, Berkeley (2016)

Download references

Acknowledgements

The authors would like to thank the reviewers for providing precious comments to improve our work.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1A2C2014386) and by the Institute of Information and Communications Technology Planning and Evaluation (IITP), Korea government (MSIT) (Development of low-latency storage module for I/O intensive edge data processing) under grant No. 2020-0-00104. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sogang University, Seoul, Republic of Korea
Safdar Jamil, Abdul Salam & Youngjae Kim
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Awais Khan
Department of Computer Science, Yonsei University, Seoul, Republic of Korea
Bernd Burgstaller
GlueSys & Anyang University, Seoul, Republic of Korea
Sung-Soon Park

Authors

Safdar Jamil
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Salam
View author publications
You can also search for this author in PubMed Google Scholar
Awais Khan
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Burgstaller
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Soon Park
View author publications
You can also search for this author in PubMed Google Scholar
Youngjae Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SJ did the idea development and most of the implementation and evaluation. AS and AK contributed to the technical discussion as well as contributed to the manuscript writing and proofreading along with BB and S-SP. As the corresponding author, YK supervised the entire process from idea development to implementation, experimentation and evaluation, and paper writing.

Corresponding author

Correspondence to Youngjae Kim.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Informed consent

All authors are informed and have consent on the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The affiliation of the authors Awais Khan and Bernd Burgstaller were inadvertently swapped. The affiliation has been corrected now.

A preliminary version of this article [Jamil et al., in Proceedings of the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)] was presented at the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington DC, USA, September, 2021.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jamil, S., Salam, A., Khan, A. et al. Scalable NUMA-aware persistent B⁺-tree for non-volatile memory devices. Cluster Comput 26, 2865–2881 (2023). https://doi.org/10.1007/s10586-022-03766-1

Download citation

Received: 10 March 2022
Revised: 18 July 2022
Accepted: 19 September 2022
Published: 17 November 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10586-022-03766-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable NUMA-aware persistent B⁺-tree for non-volatile memory devices

Abstract

Access this article

Similar content being viewed by others

Write-Optimized B+ Tree Index Technology for Persistent Memory

Revisiting Persistent Indexing Structures on Intel Optane DC Persistent Memory

Understanding and analysis of B+ trees on NVM towards consistency and efficiency

Data availibility

Change history

27 October 2023

17 October 2023

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable NUMA-aware persistent B+-tree for non-volatile memory devices

Abstract

Access this article

Similar content being viewed by others

Write-Optimized B+ Tree Index Technology for Persistent Memory

Revisiting Persistent Indexing Structures on Intel Optane DC Persistent Memory

Understanding and analysis of B+ trees on NVM towards consistency and efficiency

Data availibility

Change history

27 October 2023

17 October 2023

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Scalable NUMA-aware persistent B⁺-tree for non-volatile memory devices