An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

Zhang, Libo; Mao, Xingquan; You, Hongtao; Gu, Long; Jiang, Xiaocheng

doi:10.1007/s42514-020-00050-9

An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

Regular Paper
Published: 19 November 2020

Volume 2, pages 323–331, (2020)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Libo Zhang ORCID: orcid.org/0000-0002-7410-4799¹,
Xingquan Mao¹,
Hongtao You¹,
Long Gu¹ &
…
Xiaocheng Jiang¹

280 Accesses
Explore all metrics

Abstract

Now the OpenACC has become a popular programming interface for many-core application programming. Internationally, a lot of research have been done on OpenACC for CPU + GPU heterogeneous many-core architecture. Among them, the PGI OpenACC compiler developed by NVIDIA is the most advanced one. But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core (HGHM) Architecture that is different from GPU. This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture. Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices, and it greatly improves the transformation quality of the compiler. Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

Article 07 November 2015

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Multiple Target Task Sharing Support for the OpenMP Accelerator Model

References

Alyson, D.P., Luiz, E.R, Luis, F.W.G.: PSkel: a stencil programming framework for CPU-GPU systems. In: Concurrency and Computation Practice and Experience, April (2015). https://doi.org/10.1002/cpe.3479
Alyson, D.P., Rodrigo, C.R., Luis, F.W.G.: Extending OpenACC for efficient stencil code generation and execution by skeleton frameworks. In: International Conference on High Performance Computing and Simulation (HPCS), At Genoa, Italy (2017a)
Alyson, D.P., Rodrigo, C.R., Luiz, E.R.: Automatic partitioning of stencil computations on heterogeneous systems. In: Conference: 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAR-PADW), (2017b). https://doi.org/10.1109/SBAC-PADW.2017.16
Alyson, D.P., Rodrigo, C.R., Mario, A.R.D.: Enabling efficient stencil code generation in OpenACC. Proced. Comput. Sci. 108, 2333–2337 (2017c). https://doi.org/10.1016/j.procs.2017.05.155
Article Google Scholar
Appentra.: Faster development of better parallel software with Parallelware. https://www.appentra.com (2020)
Carter, N.P., Agrawal, A., Borkar, S., et al.: An architecture for ubiquitous high-performance computing. In: Proceedings of the 2013 Symposium on High-Performance Computer Architecture (HPCA 2013), Shenzhen, 2013, pp. 198–209. https://doi.org/10.1109/HPCA.2013.6522319
Dongarra, J., Beckman, P., Moore, T., et al.: The Int’l exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011). https://doi.org/10.1177/1094342010391989
Article Google Scholar
EESI Project.: The European exascale software initiative. https://www.eesi-project.eu/ (2020)
https://www.caps-entreprise.com
https://csmd.ornl.gov/project/openarc-open-accelerator-research-compiler
https://gcc.gnu.org/wiki/OpenACC
https://github.com/laudarch/RoseACC-workspace
https://omni-compiler.org
https://www.cray.com
https://www.nvidia.com
https://www.openacc.org/
https://www.pgroup.com/index.htm
https://www.top500.org/lists/2019/11/
Lin, Y., Terboven, C., An Mey, D., Copty, N.: Automatic scoping of variables in parallel regions of an OpenMP program. In: Chapman, B.M. (ed.) WOMPAT, volume 3349 of Lecture Notes in Computer Science, pp. 83–97. Springer, New York (2004)
Google Scholar
Rodrigo, C.R., Alyson, D.P., Luiz, E.R.: TOAST: automatic tiling for iterative stencil computations on GPUs. Concurr. Comput. Pract. Exp. 29(8), e4053 (2017). https://doi.org/10.1002/cpe.4053
Article Google Scholar
Royuela, S., Duran, A., Chunhua, L., Quinlan, D.J.: Auto-scoping for OpenMP tasks. In: IWOMP (2012)
Voss, M., Chiu, E., Chow, P.M.Y., Wong, C., Yuen, K.: An evaluation of auto-scoping in OpenMP. In: Chapman, B.M. (ed.) WOMPAT 2004, volume of 3349, LNCS. pp. 98–109. Springer, Heidelberg (2005)
Google Scholar
Yan, Y., Liu, J., Cameron, K.W.: HOMP: automated distribution of parallel loops and data in highly parallel accelerator-based systems. In: Conference: IPDPS’17 (IEEE International Parallel and Distributed Processing Symposium 2017) (2017). https://doi.org/10.1109/IPDPS.2017.99

Download references

Acknowledgements

This work is supported by (1) the National Key RD Program of China (Grant no. 2017YFB02-02004), (2) the Project of manned space engineering technology (2018-14), (3) “Large-scale parallel computation of aerodynamic problems of irregular spacecraft reentry covering various flow regimes”, and (4) the National Natural Science Foundation of China (91530319).

Author information

Authors and Affiliations

Wuxi Jiangnan Institute of Computing Technology, Wuxi, China
Libo Zhang, Xingquan Mao, Hongtao You, Long Gu & Xiaocheng Jiang

Authors

Libo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Mao
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao You
View author publications
You can also search for this author in PubMed Google Scholar
Long Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaocheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Libo Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Mao, X., You, H. et al. An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture. CCF Trans. HPC 2, 323–331 (2020). https://doi.org/10.1007/s42514-020-00050-9

Download citation

Received: 03 April 2020
Accepted: 12 September 2020
Published: 19 November 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s42514-020-00050-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

Abstract

Access this article

Similar content being viewed by others

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Multiple Target Task Sharing Support for the OpenMP Accelerator Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

Abstract

Access this article

Similar content being viewed by others

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Multiple Target Task Sharing Support for the OpenMP Accelerator Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation