Skip to main content
Log in

Scalable NUMA-aware persistent B+-tree for non-volatile memory devices

  • Published:
Cluster Computing Aims and scope Submit manuscript

A Correction to this article was published on 17 October 2023

This article has been updated

Abstract

Emerging manycore servers with Intel DC persistent memory (DCPM) are equipped with hundreds of CPU cores on multiple CPU sockets. Such servers are designed to guarantee high performance and scalability. Several recent studies proposed persistent fault-tolerant indexes for DCPM. Fast & Fair (F&F) is the state-of-the-art concurrent variant of the B+-tree for DCPM. However, its adoption on manycore servers is hampered by scalability limitations due to lengthy, lock-based synchronization including structure modification operations. The lack of NUMA awareness induces further performance overhead from remote memory accesses. In this paper, we propose F3-tree, a concurrent, NUMA-aware and persistent future-based B+-tree for DCPM servers. F\(^3\)-tree relies on per-thread local future objects and a global B+-tree. To introduce NUMA awareness and minimize remote memory accesses, F\(^3\)-tree adopts per-socket dedicated asynchronous evaluation threads to checkpoint future objects to the global B+-tree. F\(^3\)-tree employs an in-memory hash table to mitigate the read overhead of key searches over the future objects. We implemented F\(^3\)-tree atop F&F and evaluated its performance on Linux using both synthetic and realistic workloads. Our evaluation shows that F\(^3\)-tree outperforms F&F on average by 3.4\(\times\) and 5\(\times\) without and with NUMA awareness, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availibility

None.

Change history

  • 27 October 2023

    The original online version of this article was revised: The email address of the authors Awais Khan and Bernd Burgstaller has been corrected.

  • 17 October 2023

    A Correction to this paper has been published: https://doi.org/10.1007/s10586-023-04176-7

Notes

  1. We refer to multi-socket manycore machines as manycore machines hereafter.

  2. For simplicity, from here onward we refer to thread-local doubly-linked lists of FOs as PTFOs.

References

  1. Jamil, S., Khan, A., Burgstaller, B., Kim, Y.: Towards scalable manycore-aware persistent B+- trees for efficient indexing in cloud environments. In: Proceedings of the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pp. 44–49 (2021)

  2. Khan, A., Sim, H., Vazhkudai, S. S., Ma, J., Oh, M.-H., Kim, Y.: Persistent memory object storage and indexing for scientific computing. In: Proceedings of the 2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), pp. 1–9 (2020)

  3. Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020)

  4. Kim, J.-H., Kim, Y., Jamil, S., Park, S.: A NUMA-aware NVM file system design for manycore server applications. In: Proceedings of the 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 1–5 (2020)

  5. Kim, T., Khan, A., Kim, Y., Kasu, P., Atchley, S.: NUMA-aware thread scheduling for big data transfers over terabits network infrastructure. Sci. Program. 2018 4120561 (2018)

  6. Kim, J., Kim, Y., Khan, A., Park, S.: Understanding the performance of storage class memory file systems in the NUMA architecture. Clust. Comput. 22(2), 347–360 (2019)

    Article  Google Scholar 

  7. Wang, Q., Lu, Y., Li, J., Shu, J.: Nap: a black-box approach to NUMA-aware persistent memory indexes. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 93–111. USENIX Association, Berkeley (2021)

  8. Khan, A., Lee, C.-G., Hamandawana, P., Park, S., Kim, Y.: A robust fault-tolerant and scalable cluster-wide deduplication for shared-nothing storage systems. In: Proceedings of the 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 87–93 (2018)

  9. Chen, S., Jin, Q.: Persistent B+-trees in non-volatile main memory. Proc. VLDB Endow. 8, 786–797 (2015)

    Article  Google Scholar 

  10. Oukid, I., Lasperas, J., Nica, A.,Willhalm, T.,Lehner, W.: FPTree: a hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16 (New York, NY, USA), pp. 371–386. Association for Computing Machinery, New York (2016)

  11. Hwang, D., Kim, W.-H., Won, Y., Nam, B.: Endurable transient inconsistency in byte-addressable persistent B+-tree. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies, FAST’18, pp. 187–200 (2018)

  12. Khan, A., Sim, H., Vazhkudai, S.S., Kim, Y.: MOSIQS: Persistent memory object storage with metadata indexing and querying for scientific computing. IEEE Access 9, 85217–85231 (2021)

    Article  Google Scholar 

  13. Yang, J.,Wei, Q., Chen, C., Wang, C., Yong, K. L., He, B.: NV-Tree: reducing consistency cost for NVM-based single level systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 167–181 (2015)

  14. Scott, M.L.: Shared-Memory Synchronization. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Francisco (2013)

  15. Dice, D., Marathe, V. J., Shavit, N.: Flat-combining NUMA locks. In: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pp. 65–74. Association for Computing Machinery, New York (2011)

  16. Chabbi, M., Fagan, M., Mellor-Crummey, J.: High performance locks for multi-level NUMA systems. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 215–226. Association for Computing Machinery, New York (2015)

  17. Kogan, A., Herlihy, M.: The future(s) of shared data structures. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC ’14, pp. 30–39 (2014)

  18. Calciu, I., Sen, S.,Balakrishnan, M., Aguilera, M.K.: Black-box concurrent data structures for NUMA architectures. In: Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17 (New York, NY, USA), pp. 207–221. Association for Computing Machinery, New York (2017)

  19. Lehman, P.L., Yao, S.B.: Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 650–670 (1981)

    Article  MATH  Google Scholar 

  20. Yi, Z.,Yao, Y., Chen, K.: A universal construction to implement concurrent data structure for NUMA-multicore. In: Proceedings of the 50th International Conference on Parallel Processing (New York, NY, USA), Association for Computing Machinery, New York (2021)

  21. Calciu, I., Gottschlich, J., Herlihy, M.: Using elimination and delegation to implement a scalable NUMA-friendly stack. In: Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism (HotPar 13) (San Jose, CA), USENIX Association, Berkeley (2013)

  22. Bhardwaj, A., Kulkarni, C., Achermann, R., Calciu, I., Kashyap, S., Stutsman, R., Tai, A., Zellweger, G.: NrOS: effective replication and sharing in an operating system. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 295–312. USENIX Association, Berkeley (2021)

  23. Lee, S. K., Mohan, J., Kashyap, S., Kim, T., Chidambaram,V.: Recipe: converting concurrent DRAM indexes to persistent-memory indexes. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, pp. 462–477 (2019)

  24. Numactl. https://linux.die.net/man/8/numactl. Accessed: 2021-04-06

  25. Defining the future of in-memory database computing. https://pmem.io/vmem/libvmmalloc/. Accessed 1 Dec 2021

  26. Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco (2012)

    Google Scholar 

  27. Ramalhete, P., Correia, A.: Brief announcement: hazard eras—non-blocking memory reclamation. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’17, pp. 367–369. Association for Computing Machinery, New York (2017)

  28. Cohen, N., Petrank, E.: Efficient memory management for lock-free data structures with optimistic access. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pp. 254–263. Association for Computing Machinery, New York (2015)

  29. Fast &fair B+-tree. https://github.com/DICL/FAST_FAIR. Accessed 07 Feb 2022

  30. Yahoo cloud serving benchmark. https://github.com/brianfrankcooper/YCSB/. Accessed 21 Jan 2022

  31. Lu, M., Zhixi Fang, J.: A solution of the cache ping-pong problem in multiprocessor systems. J. Parallel Distrib. Comput. 16(2), 158–171 (1992)

    Article  MATH  Google Scholar 

  32. Xia, F., Jiang, D., Xiong, J., Sun, N.: HiKV: a hybrid index key-value store for DRAM-NVM memory systems. In: Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC 17) (Santa Clara, CA), pp. 349–362. USENIX Association, Berkeley (2017)

  33. Venkataraman, S., Tolia, N., Ranganathan, P., Campbell, R. H.: Consistent and durable data structures for non-volatile byte-addressable memory. In: Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST’11, p. 5 (2011)

  34. Liu, M., Xing, J., Chen, K., Wu, Y.: Building scalable NVM-based B+tree with HTM. In: Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019 (New York, NY, USA). Association for Computing Machinery, New York (2019)

  35. Yang, J., Wei, Q., Chen, C., Wang, C., Yong, K. L., He, B.: NV-Tree: Reducing consistency cost for NVM-based single level systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15) (Santa Clara, CA), pp. 167–181. USENIX Association, Berkeley (2015)

  36. Zuo, P., Hua, Y.: A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Trans. Parallel Distrib. Syst. 29(5), 985–998 (2017)

    Article  Google Scholar 

  37. Lee, S.K., Lim, K.H., Song, H., Nam, B., Noh, S.H.: WORT: Write optimal radix tree for persistent memory storage systems. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 17) (Santa Clara, CA), pp. 257–270. USENIX Association, Berkeley (2017)

  38. Zhou, X., Shou, L., Chen, K., Hu, W., Chen, G.: DPTree: Differential indexing for persistent memory. Proc. VLDB Endow. 13, 421–434 (2019)

    Article  Google Scholar 

  39. Lee, C.-G., Noh, S., Kang, H., Hwang, S., Kim, Y.: Concurrent file metadata structure using readers-writer lock. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing (New York, NY, USA), pp. 1172–1181. Association for Computing Machinery, New York (2021)

  40. Kim, J.-H., Kim, Y., Jamil, S., Lee, C.-G., Park, S.: Parallelizing shared file I/O operations of NVM file system for manycore servers. IEEE Access 9, 24570–24585 (2021)

    Article  Google Scholar 

  41. Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the Intel Optane byte-addressable NVM. In: Proceedings of the International Symposium on Memory Systems, MEMSYS ’19 (New York, NY, USA), pp. 304–315. Association for Computing Machinery, New York (2019)

  42. Daase, B., Bollmeier, L.J., Benson, L., Rabl, T.: Maximizing persistent memory bandwidth utilization for OLAP workloads. In: Proceedings of the 2021 International Conference on Management of Data, SIGMOD/PODS ’21 (New York, NY, USA), pp. 339–351. Association for Computing Machinery, New York (2021)

  43. Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid Volatile/Non-volatile main memories. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST 16) (Santa Clara, CA), pp. 323–338. USENIX Association, Berkeley (2016)

Download references

Acknowledgements

The authors would like to thank the reviewers for providing precious comments to improve our work.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1A2C2014386) and by the Institute of Information and Communications Technology Planning and Evaluation (IITP), Korea government (MSIT) (Development of low-latency storage module for I/O intensive edge data processing) under grant No. 2020-0-00104. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Contributions

SJ did the idea development and most of the implementation and evaluation. AS and AK contributed to the technical discussion as well as contributed to the manuscript writing and proofreading along with BB and S-SP. As the corresponding author, YK supervised the entire process from idea development to implementation, experimentation and evaluation, and paper writing.

Corresponding author

Correspondence to Youngjae Kim.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Informed consent

All authors are informed and have consent on the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The affiliation of the authors Awais Khan and Bernd Burgstaller were inadvertently swapped. The affiliation has been corrected now.

A preliminary version of this article [Jamil et al., in Proceedings of the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)] was presented at the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington DC, USA, September, 2021.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jamil, S., Salam, A., Khan, A. et al. Scalable NUMA-aware persistent B+-tree for non-volatile memory devices. Cluster Comput 26, 2865–2881 (2023). https://doi.org/10.1007/s10586-022-03766-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03766-1

Keywords

Navigation