Abstract
Heterogeneous architectures with both conventional CPUs and coprocessors become popular in the design of High Performance Computing systems. The programming problems on such architectures are frequently studied. OpenACC standard is proposed to tackle the problem by employing directive-based high-level programming for coprocessors. In this paper, we take advantage of OpenACC to program on the newly Intel MIC coprocessor. We achieve this by automatically translating the OpenACC source code to Intel Offload code. Two optimizations including communication and SIMD optimization are employed. Two kernels i.e. the matrix multiplication and JACOBI, are studied on the MIC-based platform (one knight Corner card) and the GPU-based platform (one NVIDIA Tesla k20c card). Performance evaluation shows that both kernels delivers a speedup of approximately 3 on one knight Corner card than on one Intel Xeon E5-2670 octal-core CPU. Moreover, the two kernels gain better performance on MIC-based platform than on the GPU-based one.
Keywords
References
Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early Experiences with the Intel Many Integrated Cores Accelerated Computing Technology. In: TeraGrid 2011 (July 2011)
Elgar, T.: Intel Many Integrated Core (MIC) Architecture. In: 2nd UK GPU Computing Conference (December 2010)
NVIDIA, CUDA programming guide 2.1 (2009), http://developer.download.nvidia.com/compute/cuda/2.1/toolkit/do-cs/NVIDIA_CUDA_Programming_Guide_2.1.pdf
The OpenACC Application Programming Interface, Version 1.0 (November 2011)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
OpenMP: The OpenMP API Specication for Parallel Programming, http://openmp.org/wp/openmp-specications/
MPI-2: Extensions to the Message-Passing Interface, Message Passing Interface Forum (July 1997)
I. Corporation. The Intel Xeon phi coprocessor: Parallel processing, unparalleled discover. Intel’ Software Network (2007)
Knights Corner Software Developers Guide, revision 1.03 (April 27, 2012)
Wu, Q., Yang, C., Tang, T., Xiao, L.: MIC Acceleration of Short-Range Molecular Dynamics Simulations. In: CGOW (January 2013)
Reyes, R., Lopez, I., Fumero, J.J., de Sande, F.: Sande.accULL: A User-directed Approach to Heterogeneous Programming (2012)
Lee, S., Min, S., Eigenmann, R.: OpenMP to GPGPU: A compiler framework for automatic translation and optimization. SIGPLANNot. (February 2009)
Wei, H., Yu, J.: Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core Chip
Dave, C., Bae, H., Min, S.-J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: A source-to-source compiler infrastructure for multicores. Computer 42(12) (2009)
Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: An OpenACC Implementation with CUDA and OpenCL Support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012)
Reyes, R., de Sande, F.: Automatic code generation for GPUs in llc. The Journal of Supercomputing 58(3) (March 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, C., Yang, C., Tang, T., Wu, Q., Zhang, P. (2013). OpenACC to Intel Offload: Automatic Translation and Optimization. In: Xu, W., Xiao, L., Zhang, C., Li, J., Yu, L. (eds) Computer Engineering and Technology. NCCET 2013. Communications in Computer and Information Science, vol 396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41635-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-41635-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41634-7
Online ISBN: 978-3-642-41635-4
eBook Packages: Computer ScienceComputer Science (R0)