Abstract
The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimization techniques to benefit from new high-performance computing machines. An extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high level stencil framework implemented for the EULAG model that efficiently utilizes heterogeneous clusters. Only an efficient usage of both CPUs and GPUs with the flexible data decomposition method can lead to the maximum performance that scales communication-intensive elliptic solver with preconditioner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kurzak, J., Bader, D., Dongarra, J.: Scientific Computing with Multicore and Accelerators. Computer and Information Science Series. Chapmann & Hall/CRC, Boca Raton (2010)
Georgescu, S., Okuda, H.: Conjugate gradients on multiple GPUs. Int J. Numer. Meth. Fluids 64, 1254–1273 (2010)
Zhang, Y., Cohen, J.M., Owens, J.D.: Fast tridiagonal solvers on GPU. In: Newsletter ACM SIGPLAN Notices - PPoPP, vol. 45, p. 5 (2010)
Prusa, J.M., Smolarkiewicz, P.K., Wyszogrodzki, A.: Eulag a computational model for multiscale flows. Comput. Fluids 37, 1193–1207 (2008)
Smolarkiewicz, P.K., Margolin, L.G.: Variational methods for elliptic problems in fluid models. In: Proceedings of ECMWF Workshop on Developments in Numerical Methods for Very High Resolution Global Models, vol. 7, pp. 137–159 (2000)
Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2010), pp. 1–12. IEEE (2010)
Christen, M., Schenk, O., Burkhart, H.: Patus: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS 2011), pp. 676–687. IEEE (2011)
Lutz, T., Fensch, C., Cole, M.: PARTANS: an autotuning framework for stencil computation on multi-GPU systems. ACM Trans. Archit. Code Optim. (TACO) 9(4), 59 (2013)
Blazewicz, M., Hinder, I., Koppelman, D.M., Brandt, S.R., Ciznicki, M., Kierzynka, M., Löffler, F., Schnetter, E., Tao, J.: From physics model to results: an optimizing framework for cross-architecture code generation. Sci. Program. 21(1–2), 1–16 (2013)
Szustak, L., Rojek, K., Olas, T., Kuczynski, L., Halbiniak, K., Gepner, P.: Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor. Sci. Program. (2015)
Wyrzykowski, R., Szustak, L., Rojek, K.: Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators. Parallel Comput. 40, 425–447 (2014)
Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), pp. 1–12. IEEE (2011)
Pereira, A.D., Ramos, L., Góes, L.F.: PSkel: a stencil programming framework for CPU-GPU systems. In: Practice and Experience, Concurrency and Computation (2015)
Rojek, K.A., Ciznicki, M., Rosa, B., Kopta, P., Kulczewski, M., Kurowski, K., Piotrowski, Z.P., Szustak, L., Wojcik, D.K., Wyrzykowski, R.: Adaptation of fluid model EULAG to graphics processing unit architecture. In: Practice and Experience, Concurrency and Computation (2014)
Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on tianhe-2. In: IEEE 28th International Parallel and Distributed Processing Symposium, pp. 745–754. IEEE (2014)
Ciznicki, M., Kopta, P., Kulczewski, M., Kurowski, K., Gepner, P.: Elliptic solver performance evaluation on modern hardware architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 155–165. Springer, Heidelberg (2014)
Acknowledgements
This work is supported by the Polish National Center of Science under Grant No. UMO-2011/03/B/ST6/03500. This research was supported in part by PL-Grid Infrastructure. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d25.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ciznicki, M., Kulczewski, M., Kopta, P., Kurowski, K. (2016). Scaling the GCR Solver Using a High-Level Stencil Framework on Multi- and Many-Core Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-32152-3_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32151-6
Online ISBN: 978-3-319-32152-3
eBook Packages: Computer ScienceComputer Science (R0)