research-article

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

Authors:
zhengming yi

National University of Defense Technology

National University of Defense Technology
View Profile

,
yiping yao

National University of Defense Technology

National University of Defense Technology
View Profile

,
Kai Chen

National University of Defense Technology

National University of Defense Technology
View Profile

ICPP '21: Proceedings of the 50th International Conference on Parallel ProcessingAugust 2021Article No.: 74Pages 1–11https://doi.org/10.1145/3472456.3472475

Published:05 October 2021Publication History

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Pages 1–11

ABSTRACT

Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.

References

[1] N. Shavit and D. Touitou. Software Transactional Memory. PODC’ 97.Google Scholar
[2] Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 779–792.Google ScholarDigital Library
[3] Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: a lightweight synchronization mechanism for concurrent programming. In Proceedings of the 25th ACM Symposium on Operating Systems Principles. ACM, 168–183.Google ScholarDigital Library
[4] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems. 509–518.Google Scholar
[5] Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box Concurrent Data Structures for NUMA Architectures. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Xi’an, China, 207–221.Google Scholar
[6] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarDigital Library
[7] Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom. 2020. An HTM-based update-side synchronization for RCU on NUMA systems. In Proceedings of the 15th European Conference on Computer Systems (EuroSys).Google ScholarDigital Library
[8] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.Google ScholarDigital Library
[9] C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software Transactional Memory: Why Is It Only a Research Toy? ACM Queue ’08.Google Scholar
[10] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124– 149, 1991.Google Scholar
[11] M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15:745–770, 1993.Google Scholar
[12] L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems (TOPLAS), 5:190–222, 1983.Google Scholar
[13] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73–82. ACM, 2002.Google Scholar
[14] C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In International Symposium on Distributed Computing, pages 108–121. Springer, 2005.Google Scholar
[15] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 11–pp. IEEE, 2003.Google Scholar
[16] J. D. Valois. Lock-free data structures. 1996.Google ScholarDigital Library
[17] T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 121–132, July 2016.Google ScholarDigital Library
[18] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News, 21(2):289–300, May 1993.Google Scholar
[19] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.Google ScholarDigital Library
[20] Paul E. McKenney and Aravinda Prasad. 2015. Some more details on Read-Log-Update. (2015). https://lwn.net/Articles/667720/.Google Scholar
[21] Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-Efficient Wait-Free Synchronization. Theory Comput. Syst. 55, 3 (2014), 475–520. https://doi.org/10.1007/s00224-013-9491-yGoogle ScholarDigital Library
[22] Maurice Herlihy. 1992. A Methodology for Implementing Highly Concurrent Data Objects (Abstract). Operating Systems Review 26, 2 (1992), 12. https://doi.org/10.1145/142111.964613Google ScholarDigital Library
[23] Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (1991), 124–149. https://doi.org/10.1145/114005. 102808Google ScholarDigital Library
[24] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. OpLog: a library for scaling update-heavy data structures. Technical Report TR-2014-019, MIT CSAIL, Sept. 2014.Google Scholar
[25] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD/PODS Conference. ACM, New York, USA, 1243–1254.Google ScholarDigital Library
[26] Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construct for Large Objects. In Proceedings of the 25rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20).Google Scholar
[27] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the ACM symposium on Parallelism in algorithms and architectures (SPAA). ACM, Thira, Santorini, Greece, 355–364.Google ScholarDigital Library
[28] Andreia Correia and Pedro Ramalhete. 2018. Strong Trylocks for Reader-Writer Locks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3178487.3178519Google ScholarDigital Library
[29] Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-Aware Reader-Writer Locks. PPoPP 2013 (2013).Google Scholar
[30]J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Fast and Portable Locking for Multicore Architectures. ACM Trans. Comput. Syst., 33(4):13:1–13:62, Jan. 2016.Google Scholar
[31] Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic Concurrency with OPTIK. In Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, 18:1–18:12.Google ScholarDigital Library
[32] Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972Google ScholarDigital Library
[33] T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP ’13.Google Scholar
[34] FAL Labs. 2011. Kyoto Cabinet: a straightforward implementation of DBM. http://fallabs.com/kyotocabinet/.Google Scholar
[35] Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 188–197.Google ScholarDigital Library
[36] Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis CM Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754-1769.Google ScholarDigital Library
[37] Dmitry Vyukov. Distributed Reader-Writer Mutex. http://www.1024cores.net/home/lock-free-algorithms/ reader-writer-problem/distributed-reader-writer-mutex.Google Scholar
[38] M. Balakrishnan, D. Malkhi, J. P. Davis, V. Prabhakaran, M. Wei, and T. Wobber. CORFU: A distributed shared log. ACM Transactions on Computer Systems, 31(4), Dec. 2013.Google Scholar
[39] D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing, ICPP ’ 15, pages 739–748, Beijing, China, 2015.Google ScholarDigital Library

Recommendations

A wait-free universal construction for large objects
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Concurrency has been a subject of study for more than 50 years. Still, many developers struggle to adapt their sequential code to be accessed concurrently. This need has pushed for generic solutions and specific concurrent data structures.

Wait-free ...
Read More
Towards a universal construction for transaction-based multiprocess programs

The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...
Read More
Towards a universal construction for transaction-based multiprocess programs
ICDCN'12: Proceedings of the 13th international conference on Distributed Computing and Networking

The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NUMA multicore
concurrent data structure
synchronization
universal construction
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 132
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

A wait-free universal construction for large objects

Towards a universal construction for transaction-based multiprocess programs

Towards a universal construction for transaction-based multiprocess programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

A wait-free universal construction for large objects

Towards a universal construction for transaction-based multiprocess programs

Towards a universal construction for transaction-based multiprocess programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media