Skip to main content

OpenACC to Intel Offload: Automatic Translation and Optimization

  • Conference paper
Computer Engineering and Technology (NCCET 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 396))

Included in the following conference series:

Abstract

Heterogeneous architectures with both conventional CPUs and coprocessors become popular in the design of High Performance Computing systems. The programming problems on such architectures are frequently studied. OpenACC standard is proposed to tackle the problem by employing directive-based high-level programming for coprocessors. In this paper, we take advantage of OpenACC to program on the newly Intel MIC coprocessor. We achieve this by automatically translating the OpenACC source code to Intel Offload code. Two optimizations including communication and SIMD optimization are employed. Two kernels i.e. the matrix multiplication and JACOBI, are studied on the MIC-based platform (one knight Corner card) and the GPU-based platform (one NVIDIA Tesla k20c card). Performance evaluation shows that both kernels delivers a speedup of approximately 3 on one knight Corner card than on one Intel Xeon E5-2670 octal-core CPU. Moreover, the two kernels gain better performance on MIC-based platform than on the GPU-based one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early Experiences with the Intel Many Integrated Cores Accelerated Computing Technology. In: TeraGrid 2011 (July 2011)

    Google Scholar 

  2. Elgar, T.: Intel Many Integrated Core (MIC) Architecture. In: 2nd UK GPU Computing Conference (December 2010)

    Google Scholar 

  3. NVIDIA, CUDA programming guide 2.1 (2009), http://developer.download.nvidia.com/compute/cuda/2.1/toolkit/do-cs/NVIDIA_CUDA_Programming_Guide_2.1.pdf

  4. The OpenACC Application Programming Interface, Version 1.0 (November 2011)

    Google Scholar 

  5. Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)

    Google Scholar 

  6. OpenMP: The OpenMP API Specication for Parallel Programming, http://openmp.org/wp/openmp-specications/

  7. MPI-2: Extensions to the Message-Passing Interface, Message Passing Interface Forum (July 1997)

    Google Scholar 

  8. I. Corporation. The Intel Xeon phi coprocessor: Parallel processing, unparalleled discover. Intel’ Software Network (2007)

    Google Scholar 

  9. Knights Corner Software Developers Guide, revision 1.03 (April 27, 2012)

    Google Scholar 

  10. Wu, Q., Yang, C., Tang, T., Xiao, L.: MIC Acceleration of Short-Range Molecular Dynamics Simulations. In: CGOW (January 2013)

    Google Scholar 

  11. Reyes, R., Lopez, I., Fumero, J.J., de Sande, F.: Sande.accULL: A User-directed Approach to Heterogeneous Programming (2012)

    Google Scholar 

  12. Lee, S., Min, S., Eigenmann, R.: OpenMP to GPGPU: A compiler framework for automatic translation and optimization. SIGPLANNot. (February 2009)

    Google Scholar 

  13. Wei, H., Yu, J.: Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core Chip

    Google Scholar 

  14. Dave, C., Bae, H., Min, S.-J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: A source-to-source compiler infrastructure for multicores. Computer 42(12) (2009)

    Google Scholar 

  15. Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: An OpenACC Implementation with CUDA and OpenCL Support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012)

    Google Scholar 

  16. Reyes, R., de Sande, F.: Automatic code generation for GPUs in llc. The Journal of Supercomputing 58(3) (March 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, C., Yang, C., Tang, T., Wu, Q., Zhang, P. (2013). OpenACC to Intel Offload: Automatic Translation and Optimization. In: Xu, W., Xiao, L., Zhang, C., Li, J., Yu, L. (eds) Computer Engineering and Technology. NCCET 2013. Communications in Computer and Information Science, vol 396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41635-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41635-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41634-7

  • Online ISBN: 978-3-642-41635-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics