Skip to main content

Generic Support for Remote Memory Access Operations in Score-P and OTF2

  • Conference paper
  • First Online:
Tools for High Performance Computing 2012

Abstract

Remote memory access (RMA) describes the ability of a process to access all or parts of the memory belonging to a remote process directly, without explicit participation of the remote side. There are a number of parallel programming models based on RMA operations that are relevant for High Performance Computing (HPC). On the one hand, Partitioned Global Address Space (PGAS) language extensions use RMA operations as underlying communication substrate, e.g. Co-Array Fortran and UPC. On the other hand, RMA programming APIs provide so called one-sided data transfer primitives as an alternative to the classic two-sided message passing. In this paper, we describe how Score-P, a scalable performance measurement infrastructure for parallel applications, is extended to support trace-based performance analyses of RMA parallelization models. Emphasis is given to the generic event model we designed to record RMA operations in the OTF2 trace format across a range of one-sided APIs and libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Sometimes they are called read and write instead, but get and put are the typical terms. To avoid confusion, load and store are used explicitly for local memory accesses.

References

  1. Benedict, S., Petkov, V., Gerndt, M.: PERISCOPE: An online-based distributed performance analysis tool. In: M.S. Mller, M.M. Resch, A. Schulz, W.E. Nagel (eds.) Tools for High Performance Computing 2009, pp. 1–16. Springer, Berlin/Heidelberg (2010). URL http://dx.doi.org/10.1007/978-3-642-11261-4_1

  2. Bonachea, D.: GASNet Specification, v1.1. Tech. rep., University of California, Berkeley (2002). URL http://techreports.lib.berkeley.edu/accessPages/CSD-02-1207

  3. Bonachea, D., Duell, J.: Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. International Journal of High Performance Computing and Networking 1(1–3), 91–99 (2004). DOI 10.1504/IJHPCN.2004.007569. URL http://portal.acm.org/citation.cfm?id=1359705

    Google Scholar 

  4. Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing openshmem: Shmem for the pgas community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS’10, pp. 2:1–2:3. ACM, New York, NY, USA (2010). DOI 10.1145/2020373.2020375. URL http://doi.acm.org/10.1145/2020373.2020375

  5. Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open Trace Format 2 – The next generation of scalable trace formats and support libraries. In: Proc. of the Intl. Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2, 2011, Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2012). DOI 10.3233/978-1-61499-041-3-481

  6. Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel I/O to task-local files. In: Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA. ACM (2009). DOI 10.1145/1654059.1654077

    Google Scholar 

  7. Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010). DOI 10.1002/cpe.1556

    Google Scholar 

  8. Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the Open Trace Format (OTF). In: Computational Science ICCS 2006: 6th International Conference, LNCS 3992. Springer, Reading, UK (2006)

    Google Scholar 

  9. Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool Set. In: Tools for High Performance Computing, pp. 139–155. Springer (2008)

    Google Scholar 

  10. Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P – A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proc. of 5th Parallel Tools Workshop, Dresden, Germany (2011)

    Google Scholar 

  11. Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the gpi programming model. Computer Science – R&D 26(3–4), 229–236 (2011). URL http://dblp.uni-trier.de/db/journals/ife/ife26.html#MachadoLAP11

  12. Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global Arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomput. 10, 169–189 (1996). URL http://portal.acm.org/citation.cfm?id=243179.243182

  13. Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communication: The ARMCI approach. Int. J. High Perform. Comput. Appl. 20, 233–253 (2006). DOI 10.1177/1094342006064504. URL http://portal.acm.org/citation.cfm?id=1125980.1125986

    Google Scholar 

  14. Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: Openshmem – toward a unified rma model. In: D.A. Padua (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer (2011). URL http://dblp.uni-trier.de/db/reference/parallel/parallel2011.html#PooleHKSCF11

  15. Reid, J.: Coarrays in the next fortran standard. SIGPLAN Fortran Forum 29, 10–27 (2010). DOI 10.1145/1837137.1837138

    Article  Google Scholar 

  16. Shende, S., Malony, A.D.: The TAU Parallel Performance System, SAGE Publications. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)

    Article  Google Scholar 

  17. The Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0 (Draft Aug. 2012). Tech. rep. (2012). Aug. 2012

    Google Scholar 

  18. UPC Consortium: UPC Language Specifications, v1.2. Tech Report LBNL-59208, Lawrence Berkeley National Lab (2005). URL http://www.gwu.edu/~upc/publications/LBNL-59208.pdf

  19. Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of cray gemini interconnect for pgas communication runtime systems. In: High Performance Interconnects (HOTI), 2011 IEEE 19th Annual Symposium on, pp. 70–77 (2011). DOI 10.1109/hoti.2011.19

    Google Scholar 

  20. Wolf, F., Mohr, B.: EPILOG Binary Trace-Data Format. Tech. Rep. FZJ-ZAM-IB-2004-06, Forschungszentrum Jülich (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Knüpfer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Knüpfer, A. et al. (2013). Generic Support for Remote Memory Access Operations in Score-P and OTF2. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37349-7_5

Download citation

Publish with us

Policies and ethics