Skip to main content

Scaling the GCR Solver Using a High-Level Stencil Framework on Multi- and Many-Core Architectures

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9574))

Abstract

The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimization techniques to benefit from new high-performance computing machines. An extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high level stencil framework implemented for the EULAG model that efficiently utilizes heterogeneous clusters. Only an efficient usage of both CPUs and GPUs with the flexible data decomposition method can lead to the maximum performance that scales communication-intensive elliptic solver with preconditioner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kurzak, J., Bader, D., Dongarra, J.: Scientific Computing with Multicore and Accelerators. Computer and Information Science Series. Chapmann & Hall/CRC, Boca Raton (2010)

    Book  Google Scholar 

  2. Georgescu, S., Okuda, H.: Conjugate gradients on multiple GPUs. Int J. Numer. Meth. Fluids 64, 1254–1273 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Zhang, Y., Cohen, J.M., Owens, J.D.: Fast tridiagonal solvers on GPU. In: Newsletter ACM SIGPLAN Notices - PPoPP, vol. 45, p. 5 (2010)

    Google Scholar 

  4. Prusa, J.M., Smolarkiewicz, P.K., Wyszogrodzki, A.: Eulag a computational model for multiscale flows. Comput. Fluids 37, 1193–1207 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Smolarkiewicz, P.K., Margolin, L.G.: Variational methods for elliptic problems in fluid models. In: Proceedings of ECMWF Workshop on Developments in Numerical Methods for Very High Resolution Global Models, vol. 7, pp. 137–159 (2000)

    Google Scholar 

  6. Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2010), pp. 1–12. IEEE (2010)

    Google Scholar 

  7. Christen, M., Schenk, O., Burkhart, H.: Patus: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS 2011), pp. 676–687. IEEE (2011)

    Google Scholar 

  8. Lutz, T., Fensch, C., Cole, M.: PARTANS: an autotuning framework for stencil computation on multi-GPU systems. ACM Trans. Archit. Code Optim. (TACO) 9(4), 59 (2013)

    Google Scholar 

  9. Blazewicz, M., Hinder, I., Koppelman, D.M., Brandt, S.R., Ciznicki, M., Kierzynka, M., Löffler, F., Schnetter, E., Tao, J.: From physics model to results: an optimizing framework for cross-architecture code generation. Sci. Program. 21(1–2), 1–16 (2013)

    Google Scholar 

  10. Szustak, L., Rojek, K., Olas, T., Kuczynski, L., Halbiniak, K., Gepner, P.: Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor. Sci. Program. (2015)

    Google Scholar 

  11. Wyrzykowski, R., Szustak, L., Rojek, K.: Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators. Parallel Comput. 40, 425–447 (2014)

    Article  MathSciNet  Google Scholar 

  12. Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), pp. 1–12. IEEE (2011)

    Google Scholar 

  13. Pereira, A.D., Ramos, L., Góes, L.F.: PSkel: a stencil programming framework for CPU-GPU systems. In: Practice and Experience, Concurrency and Computation (2015)

    Google Scholar 

  14. Rojek, K.A., Ciznicki, M., Rosa, B., Kopta, P., Kulczewski, M., Kurowski, K., Piotrowski, Z.P., Szustak, L., Wojcik, D.K., Wyrzykowski, R.: Adaptation of fluid model EULAG to graphics processing unit architecture. In: Practice and Experience, Concurrency and Computation (2014)

    Google Scholar 

  15. Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on tianhe-2. In: IEEE 28th International Parallel and Distributed Processing Symposium, pp. 745–754. IEEE (2014)

    Google Scholar 

  16. Ciznicki, M., Kopta, P., Kulczewski, M., Kurowski, K., Gepner, P.: Elliptic solver performance evaluation on modern hardware architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 155–165. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is supported by the Polish National Center of Science under Grant No. UMO-2011/03/B/ST6/03500. This research was supported in part by PL-Grid Infrastructure. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d25.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milosz Ciznicki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ciznicki, M., Kulczewski, M., Kopta, P., Kurowski, K. (2016). Scaling the GCR Solver Using a High-Level Stencil Framework on Multi- and Many-Core Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32152-3_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32151-6

  • Online ISBN: 978-3-319-32152-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics