skip to main content
10.1145/2486159.2486189acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Reducing contention through priority updates

Published: 23 July 2013 Publication History

Abstract

Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a single thread doing all the writes. Shared write access, however, can be very useful in parallel algorithms, concurrent data structures, and protocols for communicating among threads.
We study the "priority update" operation as a useful primitive for limiting write contention in parallel and concurrent programs. A priority update takes as arguments a memory location, a new value, and a comparison function >p that enforces a partial order over values. The operation atomically compares the new value with the current value in the memory location, and writes the new value only if it has higher priority according to >p. On the implementation side, we show that if implemented appropriately, priority updates greatly reduce memory contention over standard writes or other atomic operations when locations have a high degree of sharing. This is shown both experimentally and theoretically. On the application side, we describe several uses of priority updates for implementing parallel algorithms and concurrent data structures, often in a way that is deterministic, guarantees progress, and avoids serial bottlenecks. We present experiments showing that a variety of such algorithms and data structures perform well under high degrees of sharing. Given the results, we believe that the priority update operation serves as a useful parallel primitive and good programming abstraction as (1) the user largely need not worry about the degree of sharing, (2) it can be used to avoid non-determinism since, in the common case when >p is a total order, priority updates commute, and (3) it has many applications to programs using shared data.

References

[1]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic algorithms can be fast. In PPoPP, 2012.
[2]
G. E. Blelloch, J. T. Fineman, and J. Shun. Greedy sequential maximal independent set and matching are parallel on average. In SPAA, 2012.
[3]
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Combinable memory-block transactions. In SPAA, 2008.
[4]
G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and I/O efficient set covering algorithms. In SPAA, 2012.
[5]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.
[6]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3rd ed.). MIT Press, 2009.
[7]
M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008.
[8]
G. Della-Libera and N. Shavit. Reactive diffracting trees. J. Parallel Distrib. Comput., 2000.
[9]
Z. Fang, L. Zhang, J. B. Carter, A. Ibrahim, and M. A. Parker. Active memory operations. In. SC, 2007.
[10]
P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In PPoPP, 2012.
[11]
A. Fedorova, S. Blagodurov, and S. Zhuravlev. Managing contention for shared resources on multicore processors. Commun. ACM, 2010.
[12]
A. Gottlieb, R. Grishman, C. P. Kruskal, C. P. Mcauliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer - designing an MIMD parallel computer. IEEE Trans. Comput., 1983.
[13]
A. Gottlieb, B. D. Lubachevsky, and L. Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. ACM Trans. Program. Lang. Syst., 1983.
[14]
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA, 2010.
[15]
J. JaJa. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992.
[16]
J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 1991.
[17]
J. M. Mellor-Crummey and M. L. Scott. Scalable reader-writer synchronization for shared-memory multiprocessors. In PPOPP, 1991.
[18]
J. M. Mellor-Crummey and M. L. Scott. Synchronization without contention. SIGPLAN Not., 1991.
[19]
M. S. Papamarcos and J. H. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. In ISCA, 1984.
[20]
L. Rudolph and Z. Segall. Dynamic decentralized cache schemes for mimd parallel processors. In ISCA, 1984.
[21]
N. Shavit and A. Zemach. Diffracting trees. ACM Trans. Comput. Syst., 1996.
[22]
N. Shavit and A. Zemach. Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput., 2000.
[23]
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In SPAA, 2012.
[24]
G. L. Steele Jr. Making asynchronous parallelism safe for the world. In POPL, 1990.
[25]
W. E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Trans. Computers, 1988.
[26]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010.

Cited By

View all
  • (2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
  • (2023)Parallel Filtered Graphs for Hierarchical Clustering2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00153(1967-1980)Online publication date: Apr-2023
  • (2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
July 2013
348 pages
ISBN:9781450315722
DOI:10.1145/2486159
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory contention
  2. parallel programming

Qualifiers

  • Research-article

Conference

SPAA '13

Acceptance Rates

SPAA '13 Paper Acceptance Rate 31 of 130 submissions, 24%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
  • (2023)Parallel Filtered Graphs for Hierarchical Clustering2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00153(1967-1980)Online publication date: Apr-2023
  • (2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
  • (2022)Parallelization Strategies for Hierarchical Density-Based Clustering Algorithm Using OpenMP for Scan-To-BIM ApplicationsProceedings of the Canadian Society of Civil Engineering Annual Conference 202110.1007/978-981-19-0968-9_43(541-552)Online publication date: 26-May-2022
  • (2021)ParChainProceedings of the VLDB Endowment10.14778/3489496.348950915:2(285-298)Online publication date: 1-Oct-2021
  • (2020)Improving the Space-Time Efficiency of Matrix Multiplication Algorithms49th International Conference on Parallel Processing - ICPP : Workshops10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
  • (2020)High-Quality Shared-Memory Graph PartitioningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300164531:11(2710-2722)Online publication date: 1-Nov-2020
  • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
  • (2019)Efficiency Guarantees for Parallel Incremental Algorithms under Relaxed SchedulersThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323201(145-154)Online publication date: 17-Jun-2019
  • (2018)Theoretically Efficient Parallel Graph Algorithms Can Be Fast and ScalableProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210414(393-404)Online publication date: 11-Jul-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media