skip to main content
10.1145/3337821.3337901acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Modeling the Performance of Atomic Primitives on Modern Architectures

Published: 05 August 2019 Publication History

Abstract

Utilizing the atomic primitives of a processor to access a memory location atomically is key to the correctness and feasibility of parallel software systems. The performance of atomics plays a significant role in the scalability and overall performance of parallel software systems.
In this work, we study the performance -in terms of latency, throughput, fairness, energy consumption- of atomic primitives in the context of the two common software execution settings that result in high and low contention access on shared memory. We perform and present an exhaustive study of the performance of atomics in these two application contexts and propose a performance model that captures their behavior. We consider two state-of-the-art architectures: Intel Xeon E5, Xeon Phi (KNL). We propose a model that is centered around the bouncing of cache lines between threads that execute atomic primitives on these shared cache lines. The model is very simple to be used in practice and captures the behavior of atomics accurately under these execution scenarios and facilitate algorithmic design decisions in multi-threaded programming.

References

[1]
Aras Atalar, Anders Gidenstam, Paul Renaud-Goud, and Philippas Tsigas. 2015. Modeling Energy Consumption of Lock-Free Queue Implementations. In IPDPS. IEEE Computer Society, Washington, DC, USA, 229--238.
[2]
Vlastimil Babka and Petr Tůma. 2009. Investigating Cache Parameters of x86 Family Processors. In Computer Performance Evaluation and Benchmarking. Springer Berlin Heidelberg, Berlin, Heidelberg, 77--96.
[3]
Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: memory power estimation and capping. In ISLPED. ACM, New York, NY, USA, 189--194.
[4]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In SOSP. ACM, New York, NY, USA, 33--48.
[5]
Miyuru Dayarathna, Yonggang Wen, and Rui Fan. 2016. Data Center Energy Consumption Modeling: A Survey. IEEE Communications Surveys and Tutorials 18 (2016), 732--794.
[6]
Phuong Hoai Ha, Marina Papatriantafilou, and Philippas Tsigas. 2007. Efficient self-tuning spin-locks using competitive analysis. In Journal of Systems and Software, Vol. 80. Elsevier Science Inc., New York, NY, USA, 1077--1090.
[7]
Daniel Hackenberg, Daniel Molka, and Wolfgang E. Nagel. 2009. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. In MICRO. ACM, New York, NY, USA, 413--422.
[8]
Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action (1st ed.). Cambridge University Press, Cambridge.
[9]
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[10]
Intel Corporation 2014. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation.
[11]
Intel Corporation 2014. IntelR Xeon PhiTM Coprocessor: Software Developers Guide. Intel Corporation.
[12]
Guido Juckeland, Michael Kluge, Wolfgang E. Nagel, and Stefan Pflüger. 2004. Performance Analysis with BenchIT: Portable, Flexible, Easy to Use. In QEST. IEEE Computer Society, Washington, DC, USA, 320--321.
[13]
Daniel Molka, Daniel Hackenberg, and Robert Schöne. 2014. Main memory and cache performance of intel sandy bridge and AMD bulldozer. In MSPC@PLDI. ACM, New York, NY, USA, 4:1--4:10.
[14]
Daniel Molka, Daniel Hackenberg, Robert Schöne, and Matthias S. Müller. 2009. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System. In PACT. IEEE Computer Society, Washington, DC, USA, 261--270.
[15]
Sabela Ramos and Torsten Hoefler. 2013. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In HPDC. ACM, New York, NY, USA, 97--108.
[16]
Hermann Schweizer, Maciej Besta, and Torsten Hoefler. 2015. Evaluating the Cost of Atomic Operations on Modern Architectures. In PACT. IEEE Computer Society, Washington, DC, USA, 445--456.
[17]
Yakun Sophia Shao and David Brooks. 2013. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor. In ISLPED. IEEE Computer Society, Washington, DC, USA, 389--394.

Cited By

View all
  • (2023)Parallel Inference of Phylogenetic Stands with Gentrius2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00035(139-148)Online publication date: May-2023
  • (2023)Novel insights on atomic synchronization for sort-based group-by on GPUsDistributed and Parallel Databases10.1007/s10619-023-07424-241:3(387-409)Online publication date: 24-Apr-2023
  • (2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Atomic Primitives
  2. Concurrency
  3. Modeling
  4. Parallel Computing
  5. Performance
  6. Synchronization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Vetenskapsrådet
  • Stiftelsen för Strategisk Forskning

Conference

ICPP 2019

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Parallel Inference of Phylogenetic Stands with Gentrius2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00035(139-148)Online publication date: May-2023
  • (2023)Novel insights on atomic synchronization for sort-based group-by on GPUsDistributed and Parallel Databases10.1007/s10619-023-07424-241:3(387-409)Online publication date: 24-Apr-2023
  • (2022)Performance Analysis and Modelling of Concurrent Multi-access Data StructuresProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538578(333-344)Online publication date: 11-Jul-2022
  • (2021)An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW53142.2021.00016(48-53)Online publication date: Apr-2021
  • (2021)CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging ArchitecturesOpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks10.1007/978-3-031-04888-3_6(92-110)Online publication date: 13-Sep-2021
  • (2020)CircusTent: A Benchmark Suite for Atomic Memory OperationsProceedings of the International Symposium on Memory Systems10.1145/3422575.3422789(144-157)Online publication date: 28-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media