Skip to main content

CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures

  • Conference paper
  • First Online:
  • 257 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13159))

Abstract

Endeavors to engineer the next generation of exascale platforms have resulted in a fundamental shift in system architectures. Orthogonal to what was once considered conventional wisdom, high performance systems designed today are characterized by heterogeneous architectures wherein distinct components are carefully combined in order to optimize system performance and energy efficiency. One unintended consequence of this new paradigm is an increasingly complex memory hierarchy that frequently spans multiple devices and may be composed of disparate memory types. Unfortunately, the effect on performance of this new memory model is not well understood. Moreover, a quantifiable, system-agnostic methodology capable of assessing the performance of the diverse memory subsystems within emerging architectures has yet to be introduced. The CircusTent benchmark suite has been introduced to fill this void by measuring system performance with respect to atomic memory operations using established parallel programming models. However, a detailed description and evaluation of CircusTent in a distributed memory environment, critical to both current and future system architectures, has yet to be produced. In this work, we rectify this shortcoming by introducing CircusTent implementations based on the OpenSHMEM and MPI programming models and evaluating these implementations across a variety of platforms. We then detail our conclusions and characterize our observations regarding the effect of different system interconnects, memory hierarchies, and instruction set architectures on system performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this work, we use the generic term “NIC” to refer to network adapters in both Ethernet and Cray Aries networks as well as InfiniBand HCAs.

References

  1. Bale project repository (2020). https://github.com/jdevinney/bale

  2. Ahmed, A., Skadron, K.: Hopscotch: a micro-benchmark suite for memory performance evaluation. In: Proceedings of the International Symposium on Memory Systems, MEMSYS 2019, pp. 167–172. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3357526.3357574

  3. Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray XC series network. Cray Inc., White Paper WP-Aries01-1112 (2012)

    Google Scholar 

  4. InfiniBand Trade Association: Infiniband architecture specification volume 1 release 1.3. http://www.infinibandta.org/content/pages.php?pg=technology_download

  5. Broadcom: Stingray PS250 SmartNIC product brief. https://docs.broadcom.com/doc/PS250-PB)

  6. Chapman, B., et al.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/2020373.2020375

  7. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009). https://doi.org/10.1109/IISWC.2009.5306797

  8. Chen, R., Shao, Z., Li, T.: Bridging the I/O performance gap for big data workloads: a new NVDIMM-based approach. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783712

  9. UCF Consortium: OpenSNAPI project homepage. https://www.ucfconsortium.org/projects/opensnapi/

  10. David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 33–48. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2517349.2522714

  11. Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA 2011, pp. 365–376. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2000064.2000108

  12. MPI Forum: MPI: A Message-Passing Interface Standard Version 3.0. Chapter author for Collective Communication, Process Topologies, and One Sided Communications (2012)

    Google Scholar 

  13. Grodowitz, M., Shamis, P., Poole, S.: OpenSHMEM I/O extensions for fine-grained access to persistent memory storage. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 318–333. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_21

    Chapter  Google Scholar 

  14. Hoseini, F., Atalar, A., Tsigas, P.: Modeling the performance of atomic primitives on modern architectures. In: Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3337821.3337901

  15. Jeddeloh, J., Keeth, B.: Hybrid memory cube new dram architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT), pp. 87–88 (2012). https://doi.org/10.1109/VLSIT.2012.6242474

  16. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 1–12. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3079856.3080246

  17. Jun, H., et al.: HBM (high bandwidth memory) dram technology and architecture. In: 2017 IEEE International Memory Workshop (IMW), pp. 1–4 (2017). https://doi.org/10.1109/IMW.2017.7939084

  18. Labs, T.C.: RISC-V extended addressing architecture extension specification codenamed: xBGAS. https://github.com/tactcomplabs/xbgas-archspec

  19. Lavin, P., Young, J., Riedy, J., Vuduc, R., Vose, A., Ernst, D.: Spatter: a tool for evaluating gather/scatter performance (2018)

    Google Scholar 

  20. Nabi, S.W., Vanderbauwhede, W.: MP-STREAM: a memory performance benchmark for design space exploration on heterogeneous HPC devices. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 194–197 (2018). https://doi.org/10.1109/IPDPSW.2018.00036

  21. Naughton, T., Aderholdt, F., Baker, M., Pophale, S., Gorentla Venkata, M., Imam, N.: Oak ridge OpenSHMEM benchmark suite. In: Pophale, S., Imam, N., Aderholdt, F., Gorentla Venkata, M. (eds.) OpenSHMEM 2018. LNCS, vol. 11283, pp. 202–216. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04918-8_13

    Chapter  Google Scholar 

  22. NVIDIA: Bluefield-2 data sheet. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf

  23. Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: 2015 International Conference on Parallel Architecture and Compilation PACT, pp. 445–456. IEEE (2015)

    Google Scholar 

  24. Seager, K., Choi, S.-E., Dinan, J., Pritchard, H., Sur, S.: Design and implementation of OpenSHMEM using OFI on the aries interconnect. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 97–113. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50995-2_7

    Chapter  Google Scholar 

  25. Shamis, P., et al.: UCX: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40–43. IEEE (2015)

    Google Scholar 

  26. Shamis, P., et al.: Development and extension of atomic memory operations in OpenSHMEM. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2676870.2676891

  27. OSS Solutions: OpenSHMEM 1.4 specification. http://www.openshmem.org/site/sites/default/site_files/OpenSHMEM-1.4.pdf

  28. Strohmaier, E., Shan, H.: Apex-map: a global data access benchmark to analyze HPC systems and parallel programming paradigms. In: SC 2005: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 49 (2005). https://doi.org/10.1109/SC.2005.13

  29. TOS University: OSU micro-benchmarks. https://mvapich.cse.ohio-state.edu/benchmarks/

  30. Wang, X., et al.: xBGAS: a global address space extension on RISC-V for high performance computing. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 454–463 (2021). https://doi.org/10.1109/IPDPS49936.2021.00054

  31. Weeks, H., Dosanjh, M.G.F., Bridges, P.G., Grant, R.E.: SHMEM-MT: a benchmark suite for assessing multi-threaded SHMEM performance. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 227–231. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50995-2_16

    Chapter  Google Scholar 

  32. Williams, B., Leidel, J., Wang, X., Donofrio, D., Chen, Y.: CircusTent: a benchmark suite for atomic memory operations. In: The International Symposium on Memory Systems, MEMSYS 2020, pp. 144–157. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3422575.3422789

Download references

Acknowledgments

The authors would like to thank Los Alamos National Laboratory for use of the Trinitite and Capulin systems during the evaluation of this work. This study is authorized for unlimited release under LA-UR-21-28928.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brody Williams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Williams, B., Leidel, J.D., Wang, X., Donofrio, D., Chen, Y. (2022). CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures. In: Poole, S., Hernandez, O., Baker, M., Curtis, T. (eds) OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks. OpenSHMEM 2021. Lecture Notes in Computer Science, vol 13159. Springer, Cham. https://doi.org/10.1007/978-3-031-04888-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04888-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04887-6

  • Online ISBN: 978-3-031-04888-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics