skip to main content
10.1145/3472456.3472475acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

Published:05 October 2021Publication History

ABSTRACT

Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.

References

  1. [1] N. Shavit and D. Touitou. Software Transactional Memory. PODC’ 97.Google ScholarGoogle Scholar
  2. [2] Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 779–792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: a lightweight synchronization mechanism for concurrent programming. In Proceedings of the 25th ACM Symposium on Operating Systems Principles. ACM, 168–183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems. 509–518.Google ScholarGoogle Scholar
  5. [5] Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box Concurrent Data Structures for NUMA Architectures. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Xi’an, China, 207–221.Google ScholarGoogle Scholar
  6. [6] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom. 2020. An HTM-based update-side synchronization for RCU on NUMA systems. In Proceedings of the 15th European Conference on Computer Systems (EuroSys).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software Transactional Memory: Why Is It Only a Research Toy? ACM Queue ’08.Google ScholarGoogle Scholar
  10. [10] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124– 149, 1991.Google ScholarGoogle Scholar
  11. [11] M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15:745–770, 1993.Google ScholarGoogle Scholar
  12. [12] L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems (TOPLAS), 5:190–222, 1983.Google ScholarGoogle Scholar
  13. [13] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73–82. ACM, 2002.Google ScholarGoogle Scholar
  14. [14] C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In International Symposium on Distributed Computing, pages 108–121. Springer, 2005.Google ScholarGoogle Scholar
  15. [15] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 11–pp. IEEE, 2003.Google ScholarGoogle Scholar
  16. [16] J. D. Valois. Lock-free data structures. 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 121–132, July 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News, 21(2):289–300, May 1993.Google ScholarGoogle Scholar
  19. [19] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Paul E. McKenney and Aravinda Prasad. 2015. Some more details on Read-Log-Update. (2015). https://lwn.net/Articles/667720/.Google ScholarGoogle Scholar
  21. [21] Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-Efficient Wait-Free Synchronization. Theory Comput. Syst. 55, 3 (2014), 475–520. https://doi.org/10.1007/s00224-013-9491-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Maurice Herlihy. 1992. A Methodology for Implementing Highly Concurrent Data Objects (Abstract). Operating Systems Review 26, 2 (1992), 12. https://doi.org/10.1145/142111.964613Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (1991), 124–149. https://doi.org/10.1145/114005. 102808Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. OpLog: a library for scaling update-heavy data structures. Technical Report TR-2014-019, MIT CSAIL, Sept. 2014.Google ScholarGoogle Scholar
  25. [25] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD/PODS Conference. ACM, New York, USA, 1243–1254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construct for Large Objects. In Proceedings of the 25rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20).Google ScholarGoogle Scholar
  27. [27] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the ACM symposium on Parallelism in algorithms and architectures (SPAA). ACM, Thira, Santorini, Greece, 355–364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Andreia Correia and Pedro Ramalhete. 2018. Strong Trylocks for Reader-Writer Locks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3178487.3178519Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-Aware Reader-Writer Locks. PPoPP 2013 (2013).Google ScholarGoogle Scholar
  30. [30]J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Fast and Portable Locking for Multicore Architectures. ACM Trans. Comput. Syst., 33(4):13:1–13:62, Jan. 2016.Google ScholarGoogle Scholar
  31. [31] Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic Concurrency with OPTIK. In Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, 18:1–18:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP ’13.Google ScholarGoogle Scholar
  34. [34] FAL Labs. 2011. Kyoto Cabinet: a straightforward implementation of DBM. http://fallabs.com/kyotocabinet/.Google ScholarGoogle Scholar
  35. [35] Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 188–197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis CM Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754-1769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Dmitry Vyukov. Distributed Reader-Writer Mutex. http://www.1024cores.net/home/lock-free-algorithms/ reader-writer-problem/distributed-reader-writer-mutex.Google ScholarGoogle Scholar
  38. [38] M. Balakrishnan, D. Malkhi, J. P. Davis, V. Prabhakaran, M. Wei, and T. Wobber. CORFU: A distributed shared log. ACM Transactions on Computer Systems, 31(4), Dec. 2013.Google ScholarGoogle Scholar
  39. [39] D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing, ICPP ’ 15, pages 739–748, Beijing, China, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
    August 2021
    927 pages
    ISBN:9781450390682
    DOI:10.1145/3472456

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 5 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format