skip to main content
10.1145/3503221.3508426acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

The performance power of software combining in persistence

Published: 28 March 2022 Publication History

Abstract

The availability of Non-Volatile Main Memory (known as NVMM) enables the design of recoverable concurrent algorithms. We study the power of software combining in achieving recoverable synchronization and designing persistent data structures. Software combining is a general synchronization approach, which attempts to simulate the ideal world when executing synchronization requests (i.e., requests that must be executed in mutual exclusion). A single thread, called the combiner, executes all active requests, while the rest of the threads are waiting for the combiner to notify them that their requests have been applied. Software combining significantly decreases the synchronization cost and outperforms many other synchronization techniques in various cases.
We identify three persistence principles, crucial for performance, that an algorithm's designer has to take into consideration when designing highly-efficient recoverable synchronization protocols or data structures. We illustrate how to make the appropriate design decisions in all stages of devising recoverable combining protocols to respect these principles. Specifically, we present two recoverable software combining protocols, satisfying different progress properties, that are many times faster and have much lower persistence cost than a large collection of existing persistent techniques for achieving scalable synchronization. We build fundamental recoverable data structures, such as stacks and queues, based on these protocols that outperform by far existing recoverable implementations of such data structures. We also provide the first recoverable implementation of a concurrent heap and present experiments to show that it has good performance when the size of the heap is not very large.

References

[1]
Spiros N Agathos, Nikolaos D Kallimanis, and Vassilios V Dimakopoulos. 2012. Speeding up OpenMP tasking. In European Conference on Parallel Processing. Springer, 650--661.
[2]
Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R Hammond, Shintaro Iwasaki, et al. 2019. Software combining to mitigate multithreaded MPI contention. In Proceedings of the ACM International Conference on Supercomputing. 367--379.
[3]
Hagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, and Eleftherios Kosmas. 2020. Tracking in Order to Recover - Detectable Recovery of Lock-Free Data Structures (SPAA '20). Association for Computing Machinery, New York, NY, USA, 503--505.
[4]
Hagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, and Eleftherios Kosmas. 2021. Tracking in Order to Recover: Recoverable Lock-Free Data Structures. CoRR abs/1905.13600 (2021). http://arxiv.org/abs/1905.13600
[5]
Hagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, and Eleftherios Kosmas. 2022. Detectable Recovery of Lock-Free Data Structures. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Seoul, South Korea) (PPoPP '22). Association for Computing Machinery, New York, NY, USA, to appear.
[6]
Hagit Attiya, Ohad Ben-Baruch, and Danny Hendler. 2018. Nesting-Safe Recoverable Linearizability: Modular Constructions for NonVolatile Memory. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, Egham, United Kingdom, July 23--27, 2018. 7--16.
[7]
Henry C. Baker and Carl Hewitt. 1977. The Incremental Garbage Collection of Processes. In Proceedings of the 1977 Symposium on Artificial Intelligence and Programming Languages. Association for Computing Machinery, New York, NY, USA, 55--59.
[8]
H. Alan Beadle, Wentao Cai, Haosen Wen, and Michael L. Scott. 2020. Nonblocking Persistent Software Transactional Memory. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 429--430.
[9]
Ohad Ben-Baruch, Danny Hendler, and Matan Rusanovsky. 2020. Upper and Lower Bounds on the Space Complexity of Detectable Objects. In Proceedings of the 39th Symposium on Principles of Distributed Computing (Virtual Event, Italy) (PODC '20). Association for Computing Machinery, New York, NY, USA, 11--20.
[10]
Naama Ben-David, Guy E. Blelloch, Michal Friedman, and Yuanhao Wei. 2019. Delay-Free Concurrency on Faulty Persistent Memory. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures (Phoenix, AZ, USA) (SPAA '19). Association for Computing Machinery, New York, NY, USA, 253--264.
[11]
Guy E. Blelloch and Yuanhao Wei. 2020. Brief Announcement: Concurrent Fixed-Size Allocation and Free in Constant Time. In 34th International Symposium on Distributed Computing (DISC 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 179), Hagit Attiya (Ed.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 51:1--51:3.
[12]
A. Bouajjani, M. Emmi, C. Enea, and S.O. Mutluergil. 2017. Proving Linearizability Using Forward Simulations. In In: Majumdar R., Kuncak V. (eds) Computer Aided Verification. (CAV '17, Vol. 10427). Lecture Notes in Computer Science, Springer, Cham., New York, NY, USA.
[13]
Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging Locks for Non-Volatile Memory Consistency. SIGPLAN Not. 49, 10 (Oct. 2014), 433--452.
[14]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with next-Generation, Non-Volatile Memories. SIGARCH Comput. Archit. News 39, 1 (March 2011), 105--118.
[15]
Nachshon Cohen, Rachid Guerraoui, and Igor Zablotchi. 2018. The Inherent Cost of Remembering Consistently. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (Vienna, Austria) (SPAA '18). Association for Computing Machinery, New York, NY, USA, 259--269.
[16]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. [n.d.]. The Code for RedoDB. https://github.com/pramalhe/RedoDB. https://github.com/pramalhe/RedoDB
[17]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. 2018. Romulus: Efficient Algorithms for Persistent Transactional Memory. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (Vienna, Austria) (SPAA '18). Association for Computing Machinery, New York, NY, USA, 271--282.
[18]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. 2020. Persistent Memory and the Rise of Universal Constructions. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 5, 15 pages.
[19]
Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construction for Large Objects. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 102--116.
[20]
David Dice, Virendra J Marathe, and Nir Shavit. 2012. Lock cohorting: a general technique for designing NUMA locks. ACM SIGPLAN Notices 47, 8 (2012), 247--256.
[21]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2011. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). Association for Computing Machinery, New York, NY, USA, 325--334.
[22]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the Combining Synchronization Technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New Orleans, Louisiana, USA) (PPoPP '12). Association for Computing Machinery, New York, NY, USA, 257--626.
[23]
Panagiota Fatourou and Nikolaos D Kallimanis. 2014. Highly-efficient wait-free synchronization. Theory of Computing Systems 55, 3 (2014), 475--520.
[24]
Panagiota Fatourou and Nikolaos D Kallimanis. 2018. Lock Oscillation: Boosting the Performance of Concurrent Data Structures. In 21st International Conference on Principles of Distributed Systems (OPODIS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[25]
Panagiota Fatourou, Nikolaos D. Kallimanis, and Eleftherios Kosmas. 2021. Persistent Software Combining. CoRR abs/2107.03492 (2021). arXiv:2107.03492 https://arxiv.org/abs/2107.03492
[26]
Panagiota Fatourou, Nikolaos D. Kallimanis, and Thomas Ropars. 2018. An Efficient Wait-Free Resizable Hash Table. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (Vienna, Austria) (SPAA '18). Association for Computing Machinery, New York, NY, USA, 111--120.
[27]
Michal Friedman, Naama Ben-David, Yuanhao Wei, Guy E. Blelloch, and Erez Petrank. 2020. NVTraverse: In NVRAM Data Structures, the Destination is More Important than the Journey. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 377--392.
[28]
Michal Friedman, Maurice Herlihy, Virendra Marathe, and Erez Petrank. 2018. A persistent lock-free queue for non-volatile memory. ACM SIGPLAN Notices 53, 1 (2018), 28--40.
[29]
Michal Friedman, Erez Petrank, and Pedro Ramalhete. 2021. Mirror: Making Lock-Free Data Structures Persistent. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 1218--1232.
[30]
Richard L. Graham, Timothy S. Woodall, and Jeffrey M. Squyres. 2006. Open MPI: A Flexible High Performance MPI. In Parallel Processing and Applied Mathematics. Springer Berlin Heidelberg, Berlin, Heidelberg, 228--239.
[31]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures. 355--364.
[32]
Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2004. A Scalable Lock-Free Stack Algorithm. In Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures (Barcelona, Spain) (SPAA '04). Association for Computing Machinery, New York, NY, USA, 206--215.
[33]
Maurice Herlihy. 1993. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 15, 5 (nov 1993), 745--770.
[34]
Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-Atomic Persistent Memory Updates via JUSTDO Logging. SIGPLAN Not. 51, 4 (March 2016), 427--442.
[35]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott. 2016. Linearizability of Persistent Memory Objects Under a Full-System-Crash Failure Model. In Proceedings of the 30th International Symposium of Distributed Computing (Vienna, Austria) (DISC '16, Vol. LNCS 9888). Springer, 313--327.
[36]
Nikolaos D. Kallimanis. [n.d.]. Synch: A framework for concurrent data-structures and benchmarks. https://github.com/nkallima/sim-universal-construction. https://github.com/nkallima/sim-universal-construction
[37]
David Klaftenegger, Konstantinos Sagonas, and Kjell Winblad. 2018. Queue Delegation Locking. IEEE Transactions on Parallel and Distributed Systems 29, 3 (2018), 687--704.
[38]
Nan Li and Wojciech Golab. 2021. Brief Announcement: Detectable Sequential Specifications for Recoverable Shared Objects. In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing (Virtual Event, Italy) (PODC'21). Association for Computing Machinery, New York, NY, USA, 557--560.
[39]
Qingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. 2018. IDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (Fukuoka, Japan) (MICRO-51). IEEE Press, 258--270.
[40]
John M Mellor-Crummey and Michael L Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS) 9, 1 (1991), 21--65.
[41]
Maged M Michael and Michael L Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. 267--275.
[42]
Yoshihiro Oyama, Kenjiro Taura, and Akinori Yonezawa. 1999. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications (PDSIA '99). 182--204.
[43]
Manolis Ploumidis, Nikolaos D. Kallimanis, Marios Asiminakis, Nikos Chrysos, Pantelis Xirouchakis, Michalis Gianoudis, Leandros Tzanakis, Nikolaos Dimou, Antonis Psistakis, Panagiotis Peristerakis, Giorgos Kalokairinos, Vassilis Papaefstathiou, and Manolis Katevenis. 2019. Software and Hardware Co-design for Low-Power HPC Platforms. In High Performance Computing. Springer International Publishing, 88--100.
[44]
PMDK. [n.d.]. The Persistent Memory Development Kit. https://github.com/pmem/pmdk/. https://github.com/pmem/pmdk/
[45]
Pedro Ramalhete, Andreia Correia, Pascal Felber, and Nachshon Cohen. 2019. OneFile: A wait-free persistent transactional memory. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 151--163.
[46]
Matan Rusanovsky, Ohad Ben-Baruch, Danny Hendler, and Pedro Ramalhete. [n.d.]. The Code for DFC. https://github.com/matanr/detectable_flat_combining. https://github.com/matanr/detectable_flat_combining
[47]
Matan Rusanovsky, Ohad Ben-Baruch, Danny Hendler, and Pedro Ramalhete. 2020 (version submited at 23 December, 2020). A Flat-Combining-Based Persistent Stack for Non-Volatile Memory. CoRR abs/2012.12868 (2020 (version submited at 23 December, 2020)). arXiv:2012.12868 https://arxiv.org/abs/2012.12868
[48]
Thomas R.W. Scogland and Wu-chun Feng. 2015. Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (Austin, Texas, USA) (ICPE '15). Association for Computing Machinery, New York, NY, USA, 63--74.
[49]
Gal Sela and Erez Petrank. [n.d.]. The Code for Durable Queues. https://github.com/galysela/DurableQueues. https://github.com/galysela/DurableQueues
[50]
Gal Sela and Erez Petrank. 2021. Durable Queues: The Second Amendment. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery, New York, NY, USA, 385--397.
[51]
Shahar Timnat and Erez Petrank. 2014. A practical wait-free simulation for lock-free data structures. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, Orlando, FL, USA, February 15--19, 2014. 357--368.
[52]
Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory. In 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 15--17, 2011. 61--75. http://www.usenix.org/events/fast11/tech/techAbstracts.html#Venkataraman
[53]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. SIGPLAN Not. 46, 3 (March 2011), 91--104.
[54]
Kai Wu, Jie Ren, Ivy Peng, and Dong Li. 2021. ArchTM: Architecture-Aware, High Performance Transaction for Persistent Memory. (Feb. 2021), 141--153. https://www.usenix.org/conference/fast21/presentation/wu-kai
[55]
Yi Xu, Joseph Izraelevitz, and Steven Swanson. 2021. Clobber-NVM: Log Less, Re-Execute More. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 346--359.
[56]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 169--182. https://www.usenix.org/conference/fast20/presentation/yang

Cited By

View all
  • (2025)When is recoverable consensus harder than consensus?Distributed Computing10.1007/s00446-025-00476-wOnline publication date: 7-Feb-2025
  • (2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
  • (2023)TL4xProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577495(245-259)Online publication date: 25-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
April 2022
495 pages
ISBN:9781450392044
DOI:10.1145/3503221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. concurrent data structures
  2. heap
  3. non-volatile memory
  4. nvm-based computing
  5. performance analysis
  6. performance principles
  7. persistence
  8. queue
  9. recoverable algorithms and data structures
  10. software combining
  11. stack
  12. synchronization
  13. wait-freedom

Qualifiers

  • Research-article

Conference

PPoPP '22

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)When is recoverable consensus harder than consensus?Distributed Computing10.1007/s00446-025-00476-wOnline publication date: 7-Feb-2025
  • (2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
  • (2023)TL4xProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577495(245-259)Online publication date: 25-Feb-2023
  • (2022)When is Recoverable Consensus Harder Than Consensus?Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing10.1145/3519270.3538418(198-208)Online publication date: 20-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media