Generic Support for Remote Memory Access Operations in Score-P and OTF2

Knüpfer, Andreas; Dietrich, Robert; Doleschal, Jens; Geimer, Markus; Hermanns, Marc-André; Rössel, Christian; Tschüter, Ronny; Wesarg, Bert; Wolf, Felix

doi:10.1007/978-3-642-37349-7_5

Andreas Knüpfer⁶,
Robert Dietrich⁶,
Jens Doleschal⁶,
Markus Geimer⁶,
Marc-André Hermanns⁶,
Christian Rössel⁶,
Ronny Tschüter⁶,
Bert Wesarg⁶ &
…
Felix Wolf⁶

658 Accesses
1 Citations

Abstract

Remote memory access (RMA) describes the ability of a process to access all or parts of the memory belonging to a remote process directly, without explicit participation of the remote side. There are a number of parallel programming models based on RMA operations that are relevant for High Performance Computing (HPC). On the one hand, Partitioned Global Address Space (PGAS) language extensions use RMA operations as underlying communication substrate, e.g. Co-Array Fortran and UPC. On the other hand, RMA programming APIs provide so called one-sided data transfer primitives as an alternative to the classic two-sided message passing. In this paper, we describe how Score-P, a scalable performance measurement infrastructure for parallel applications, is extended to support trace-based performance analyses of RMA parallelization models. Emphasis is given to the generic event model we designed to record RMA operations in the OTF2 trace format across a range of one-sided APIs and libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MYX: Runtime Correctness Analysis for Multi-Level Parallel Programming Paradigms

OpenSHMEM Active Message Extension for Task-Based Programming

Beyond Explicit Transfers: Shared and Managed Memory in OpenMP

Notes

1.
Sometimes they are called read and write instead, but get and put are the typical terms. To avoid confusion, load and store are used explicitly for local memory accesses.

References

Benedict, S., Petkov, V., Gerndt, M.: PERISCOPE: An online-based distributed performance analysis tool. In: M.S. Mller, M.M. Resch, A. Schulz, W.E. Nagel (eds.) Tools for High Performance Computing 2009, pp. 1–16. Springer, Berlin/Heidelberg (2010). URL http://dx.doi.org/10.1007/978-3-642-11261-4_1
Bonachea, D.: GASNet Specification, v1.1. Tech. rep., University of California, Berkeley (2002). URL http://techreports.lib.berkeley.edu/accessPages/CSD-02-1207
Bonachea, D., Duell, J.: Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. International Journal of High Performance Computing and Networking 1(1–3), 91–99 (2004). DOI 10.1504/IJHPCN.2004.007569. URL http://portal.acm.org/citation.cfm?id=1359705
Google Scholar
Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing openshmem: Shmem for the pgas community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS’10, pp. 2:1–2:3. ACM, New York, NY, USA (2010). DOI 10.1145/2020373.2020375. URL http://doi.acm.org/10.1145/2020373.2020375
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open Trace Format 2 – The next generation of scalable trace formats and support libraries. In: Proc. of the Intl. Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2, 2011, Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2012). DOI 10.3233/978-1-61499-041-3-481
Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel I/O to task-local files. In: Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA. ACM (2009). DOI 10.1145/1654059.1654077
Google Scholar
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010). DOI 10.1002/cpe.1556
Google Scholar
Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the Open Trace Format (OTF). In: Computational Science ICCS 2006: 6th International Conference, LNCS 3992. Springer, Reading, UK (2006)
Google Scholar
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool Set. In: Tools for High Performance Computing, pp. 139–155. Springer (2008)
Google Scholar
Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P – A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proc. of 5th Parallel Tools Workshop, Dresden, Germany (2011)
Google Scholar
Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the gpi programming model. Computer Science – R&D 26(3–4), 229–236 (2011). URL http://dblp.uni-trier.de/db/journals/ife/ife26.html#MachadoLAP11
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global Arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomput. 10, 169–189 (1996). URL http://portal.acm.org/citation.cfm?id=243179.243182
Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communication: The ARMCI approach. Int. J. High Perform. Comput. Appl. 20, 233–253 (2006). DOI 10.1177/1094342006064504. URL http://portal.acm.org/citation.cfm?id=1125980.1125986
Google Scholar
Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: Openshmem – toward a unified rma model. In: D.A. Padua (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer (2011). URL http://dblp.uni-trier.de/db/reference/parallel/parallel2011.html#PooleHKSCF11
Reid, J.: Coarrays in the next fortran standard. SIGPLAN Fortran Forum 29, 10–27 (2010). DOI 10.1145/1837137.1837138
Article Google Scholar
Shende, S., Malony, A.D.: The TAU Parallel Performance System, SAGE Publications. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)
Article Google Scholar
The Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0 (Draft Aug. 2012). Tech. rep. (2012). Aug. 2012
Google Scholar
UPC Consortium: UPC Language Specifications, v1.2. Tech Report LBNL-59208, Lawrence Berkeley National Lab (2005). URL http://www.gwu.edu/~upc/publications/LBNL-59208.pdf
Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of cray gemini interconnect for pgas communication runtime systems. In: High Performance Interconnects (HOTI), 2011 IEEE 19th Annual Symposium on, pp. 70–77 (2011). DOI 10.1109/hoti.2011.19
Google Scholar
Wolf, F., Mohr, B.: EPILOG Binary Trace-Data Format. Tech. Rep. FZJ-ZAM-IB-2004-06, Forschungszentrum Jülich (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Information Services and HPC (ZIH), TU Dresden, 01062, Dresden, Germany
Andreas Knüpfer, Robert Dietrich, Jens Doleschal, Markus Geimer, Marc-André Hermanns, Christian Rössel, Ronny Tschüter, Bert Wesarg & Felix Wolf

Authors

Andreas Knüpfer
View author publications
You can also search for this author in PubMed Google Scholar
Robert Dietrich
View author publications
You can also search for this author in PubMed Google Scholar
Jens Doleschal
View author publications
You can also search for this author in PubMed Google Scholar
Markus Geimer
View author publications
You can also search for this author in PubMed Google Scholar
Marc-André Hermanns
View author publications
You can also search for this author in PubMed Google Scholar
Christian Rössel
View author publications
You can also search for this author in PubMed Google Scholar
Ronny Tschüter
View author publications
You can also search for this author in PubMed Google Scholar
Bert Wesarg
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Knüpfer .

Editor information

Editors and Affiliations

Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
Alexey Cheptsov
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
Steffen Brinkmann
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
José Gracia
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstrasße 19, Stuttgart, 70550, Germany
Michael M. Resch
Zentrum für Informationsdienste, und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Helmholtzstr. 10, Dresden, 01062, Germany
Wolfgang E. Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Knüpfer, A. et al. (2013). Generic Support for Remote Memory Access Operations in Score-P and OTF2. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37349-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-37349-7_5
Published: 13 May 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37348-0
Online ISBN: 978-3-642-37349-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics