Skip to main content

OpenMP Target Device Offloading for the SX-Aurora TSUBASA Vector Engine

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12043))

Abstract

Driven by the heterogeneity trend in modern supercomputers, OpenMP provides support for heterogeneous systems since 2013. Having a single programming model for all kinds of accelerator-based systems decreases the burden of code porting to different device types. The acceptance of this heterogeneous paradigm requires the availability of corresponding OpenMP compiler and runtime environments supporting different target device architectures. The LLVM/Clang infrastructure is designated to extend the offloading features for any new target platform. However, this supposes a compatible compiler backend for the target architecture. In order to overcome this limitation we present a source-to-source code transformation technique which outlines the OpenMP code regions for the target device. By combining this technique with a corresponding communication layer, we enable OpenMP target offloading to the NEC SX-Aurora TSUBASA vector engine, which represents the new generation of vector computing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/veos-sxarr-NEC/veoffload.

  2. 2.

    https://github.com/RWTH-HPC.

  3. 3.

    https://rwth-hpc.github.io/sx-aurora-offloading.

  4. 4.

    https://github.com/clang-omp/OffloadingDesign.

  5. 5.

    Clang allows to define x86 as target device for testing purpose, where the target regions are executed on the host, but using the corresponding plugin in libomptarget.

References

  1. The Riken Himeno CFD Benchmark. http://accc.riken.jp/en/supercom/documents/himenobmt

  2. Álvarez, Á., Ugarte, Í., Fernández, V., Sánchez, P.: OpenMP dynamic device offloading in heterogeneous platforms. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 109–122. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_8

    Chapter  Google Scholar 

  3. Antao, S.F., et al.: Offloading support for OpenMP in Clang and LLVM. In: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, LLVM-HPC 2016, pp. 1–11. IEEE Press, Piscataway (2016)

    Google Scholar 

  4. Bertolli, C., et al.: Integrating GPU support for OpenMP offloading directives into Clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. ACM, New York (2015)

    Google Scholar 

  5. Diaz, J.M., Pophale, S., Friedline, K., Hernandez, O., Bernholdt, D.E., Chandrasekaran, S.: Evaluating support for OpenMP offload features. In: Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP 2018, pp. 31:1–31:10. ACM, New York (2018)

    Google Scholar 

  6. Diaz, J.M., Pophale, S., Hernandez, O., Bernholdt, D.E., Chandrasekaran, S.: OpenMP 4.5 validation and verification suite for device offload. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 82–95. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_6

    Chapter  Google Scholar 

  7. Hart, A.: First experiences porting a parallel application to a hybrid supercomputer with OpenMP4.0 device constructs. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 73–85. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_6

    Chapter  Google Scholar 

  8. Ishizaka, K., Marukawa, K., Focht, E., Moll, S., Kurtenacker, M., Hack, S.: NEC SX-Aurora - A Scalable Vector Architecture. LLVM Developers’ Meeting (2018)

    Google Scholar 

  9. Mitra, G., Stotzer, E., Jayaraj, A., Rendell, A.P.: Implementation and optimization of the OpenMP accelerator model for the TI keystone II architecture. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 202–214. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_15

    Chapter  Google Scholar 

  10. Newburn, C.J., et al.: Offload compiler runtime for the Intel® Xeon Phi coprocessor. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 1213–1225, May 2013

    Google Scholar 

  11. OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 5.0, November 2018

    Google Scholar 

  12. Sommer, L., Korinth, J., Koch, A.: OpenMP device offloading to FPGA accelerators. In: 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 201–205, July 2017

    Google Scholar 

  13. Yamada, Y., Momose, S.: Vector Engine Processor of NEC’s Brand-New Supercomputer SX-Aurora TSUBASA. Hot Chips Symposium on High Performance Chips, August 2018. https://www.hotchips.org. Accessed 05/19

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Cramer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cramer, T., Römmer, M., Kosmynin, B., Focht, E., Müller, M.S. (2020). OpenMP Target Device Offloading for the SX-Aurora TSUBASA Vector Engine. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43229-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43228-7

  • Online ISBN: 978-3-030-43229-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics