ABSTRACT
Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.
- [1] N. Shavit and D. Touitou. Software Transactional Memory. PODC’ 97.Google Scholar
- [2] Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 779–792.Google ScholarDigital Library
- [3] Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: a lightweight synchronization mechanism for concurrent programming. In Proceedings of the 25th ACM Symposium on Operating Systems Principles. ACM, 168–183.Google ScholarDigital Library
- [4] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems. 509–518.Google Scholar
- [5] Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box Concurrent Data Structures for NUMA Architectures. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Xi’an, China, 207–221.Google Scholar
- [6] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarDigital Library
- [7] Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom. 2020. An HTM-based update-side synchronization for RCU on NUMA systems. In Proceedings of the 15th European Conference on Computer Systems (EuroSys).Google ScholarDigital Library
- [8] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarDigital Library
- [9] C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software Transactional Memory: Why Is It Only a Research Toy? ACM Queue ’08.Google Scholar
- [10] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124– 149, 1991.Google Scholar
- [11] M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15:745–770, 1993.Google Scholar
- [12] L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems (TOPLAS), 5:190–222, 1983.Google Scholar
- [13] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73–82. ACM, 2002.Google Scholar
- [14] C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In International Symposium on Distributed Computing, pages 108–121. Springer, 2005.Google Scholar
- [15] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 11–pp. IEEE, 2003.Google Scholar
- [16] J. D. Valois. Lock-free data structures. 1996.Google ScholarDigital Library
- [17] T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 121–132, July 2016.Google ScholarDigital Library
- [18] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News, 21(2):289–300, May 1993.Google Scholar
- [19] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.Google ScholarDigital Library
- [20] Paul E. McKenney and Aravinda Prasad. 2015. Some more details on Read-Log-Update. (2015). https://lwn.net/Articles/667720/.Google Scholar
- [21] Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-Efficient Wait-Free Synchronization. Theory Comput. Syst. 55, 3 (2014), 475–520. https://doi.org/10.1007/s00224-013-9491-yGoogle ScholarDigital Library
- [22] Maurice Herlihy. 1992. A Methodology for Implementing Highly Concurrent Data Objects (Abstract). Operating Systems Review 26, 2 (1992), 12. https://doi.org/10.1145/142111.964613Google ScholarDigital Library
- [23] Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (1991), 124–149. https://doi.org/10.1145/114005. 102808Google ScholarDigital Library
- [24] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. OpLog: a library for scaling update-heavy data structures. Technical Report TR-2014-019, MIT CSAIL, Sept. 2014.Google Scholar
- [25] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD/PODS Conference. ACM, New York, USA, 1243–1254.Google ScholarDigital Library
- [26] Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construct for Large Objects. In Proceedings of the 25rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20).Google Scholar
- [27] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the ACM symposium on Parallelism in algorithms and architectures (SPAA). ACM, Thira, Santorini, Greece, 355–364.Google ScholarDigital Library
- [28] Andreia Correia and Pedro Ramalhete. 2018. Strong Trylocks for Reader-Writer Locks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3178487.3178519Google ScholarDigital Library
- [29] Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-Aware Reader-Writer Locks. PPoPP 2013 (2013).Google Scholar
- [30]J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Fast and Portable Locking for Multicore Architectures. ACM Trans. Comput. Syst., 33(4):13:1–13:62, Jan. 2016.Google Scholar
- [31] Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic Concurrency with OPTIK. In Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, 18:1–18:12.Google ScholarDigital Library
- [32] Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972Google ScholarDigital Library
- [33] T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP ’13.Google Scholar
- [34] FAL Labs. 2011. Kyoto Cabinet: a straightforward implementation of DBM. http://fallabs.com/kyotocabinet/.Google Scholar
- [35] Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 188–197.Google ScholarDigital Library
- [36] Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis CM Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754-1769.Google ScholarDigital Library
- [37] Dmitry Vyukov. Distributed Reader-Writer Mutex. http://www.1024cores.net/home/lock-free-algorithms/ reader-writer-problem/distributed-reader-writer-mutex.Google Scholar
- [38] M. Balakrishnan, D. Malkhi, J. P. Davis, V. Prabhakaran, M. Wei, and T. Wobber. CORFU: A distributed shared log. ACM Transactions on Computer Systems, 31(4), Dec. 2013.Google Scholar
- [39] D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing, ICPP ’ 15, pages 739–748, Beijing, China, 2015.Google ScholarDigital Library
Recommendations
A wait-free universal construction for large objects
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingConcurrency has been a subject of study for more than 50 years. Still, many developers struggle to adapt their sequential code to be accessed concurrently. This need has pushed for generic solutions and specific concurrent data structures.
Wait-free ...
Towards a universal construction for transaction-based multiprocess programs
The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...
Towards a universal construction for transaction-based multiprocess programs
ICDCN'12: Proceedings of the 13th international conference on Distributed Computing and NetworkingThe aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...
Comments