Abstract
Now the OpenACC has become a popular programming interface for many-core application programming. Internationally, a lot of research have been done on OpenACC for CPU + GPU heterogeneous many-core architecture. Among them, the PGI OpenACC compiler developed by NVIDIA is the most advanced one. But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core (HGHM) Architecture that is different from GPU. This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture. Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices, and it greatly improves the transformation quality of the compiler. Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance.
Similar content being viewed by others
References
Alyson, D.P., Luiz, E.R, Luis, F.W.G.: PSkel: a stencil programming framework for CPU-GPU systems. In: Concurrency and Computation Practice and Experience, April (2015). https://doi.org/10.1002/cpe.3479
Alyson, D.P., Rodrigo, C.R., Luis, F.W.G.: Extending OpenACC for efficient stencil code generation and execution by skeleton frameworks. In: International Conference on High Performance Computing and Simulation (HPCS), At Genoa, Italy (2017a)
Alyson, D.P., Rodrigo, C.R., Luiz, E.R.: Automatic partitioning of stencil computations on heterogeneous systems. In: Conference: 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAR-PADW), (2017b). https://doi.org/10.1109/SBAC-PADW.2017.16
Alyson, D.P., Rodrigo, C.R., Mario, A.R.D.: Enabling efficient stencil code generation in OpenACC. Proced. Comput. Sci. 108, 2333–2337 (2017c). https://doi.org/10.1016/j.procs.2017.05.155
Appentra.: Faster development of better parallel software with Parallelware. https://www.appentra.com (2020)
Carter, N.P., Agrawal, A., Borkar, S., et al.: An architecture for ubiquitous high-performance computing. In: Proceedings of the 2013 Symposium on High-Performance Computer Architecture (HPCA 2013), Shenzhen, 2013, pp. 198–209. https://doi.org/10.1109/HPCA.2013.6522319
Dongarra, J., Beckman, P., Moore, T., et al.: The Int’l exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011). https://doi.org/10.1177/1094342010391989
EESI Project.: The European exascale software initiative. https://www.eesi-project.eu/ (2020)
https://csmd.ornl.gov/project/openarc-open-accelerator-research-compiler
Lin, Y., Terboven, C., An Mey, D., Copty, N.: Automatic scoping of variables in parallel regions of an OpenMP program. In: Chapman, B.M. (ed.) WOMPAT, volume 3349 of Lecture Notes in Computer Science, pp. 83–97. Springer, New York (2004)
Rodrigo, C.R., Alyson, D.P., Luiz, E.R.: TOAST: automatic tiling for iterative stencil computations on GPUs. Concurr. Comput. Pract. Exp. 29(8), e4053 (2017). https://doi.org/10.1002/cpe.4053
Royuela, S., Duran, A., Chunhua, L., Quinlan, D.J.: Auto-scoping for OpenMP tasks. In: IWOMP (2012)
Voss, M., Chiu, E., Chow, P.M.Y., Wong, C., Yuen, K.: An evaluation of auto-scoping in OpenMP. In: Chapman, B.M. (ed.) WOMPAT 2004, volume of 3349, LNCS. pp. 98–109. Springer, Heidelberg (2005)
Yan, Y., Liu, J., Cameron, K.W.: HOMP: automated distribution of parallel loops and data in highly parallel accelerator-based systems. In: Conference: IPDPS’17 (IEEE International Parallel and Distributed Processing Symposium 2017) (2017). https://doi.org/10.1109/IPDPS.2017.99
Acknowledgements
This work is supported by (1) the National Key RD Program of China (Grant no. 2017YFB02-02004), (2) the Project of manned space engineering technology (2018-14), (3) “Large-scale parallel computation of aerodynamic problems of irregular spacecraft reentry covering various flow regimes”, and (4) the National Natural Science Foundation of China (91530319).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, L., Mao, X., You, H. et al. An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture. CCF Trans. HPC 2, 323–331 (2020). https://doi.org/10.1007/s42514-020-00050-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-020-00050-9