Asynchronous AMR on Multi-GPUs

Farooqi, Muhammad Nufail; Nguyen, Tan; Zhang, Weiqun; Almgren, Ann S.; Shalf, John; Unat, Didem

doi:10.1007/978-3-030-34356-9_11

Asynchronous AMR on Multi-GPUs

Muhammad Nufail Farooqi¹²,
Tan Nguyen¹³,
Weiqun Zhang¹³,
Ann S. Almgren¹³,
John Shalf¹³ &
…
Didem Unat¹²

Conference paper
First Online: 03 December 2019

5959 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Abstract

Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Almgren, A., Bell, J.B., Lijewski, M., Lukic, Z., Andel, E.V.: Nyx: a massively parallel amr code for computational cosmology. APJ 765, 39 (2013)
Article Google Scholar
Almgren, A.S., et al.: CASTRO: a new compressible astrophysical solver. I. Hydrodynamics and self-gravity. Astrophys. J. 715, 1221–1238 (2010)
Article Google Scholar
AMReX: Block-structured AMR framework. https://ccse.lbl.gov/AMReX/index.html
Ang, J., et al.: In: 2014 Hardware-Software Co-Design for High Performance Computing (2014)
Google Scholar
Colella, P., et al.: Chombo software package for AMR applications design document. Technical report, LBNL (2003)
Google Scholar
Day, M.S., Bell, J.B.: Numerical simulation of laminar reacting flows with complex chemistry. Combust. Theory Model. 4(4), 535–556 (2000)
Article Google Scholar
Emmett, M., Zhang, W., Bell, J.B.: High-order algorithms for compressible reacting flow with complex chemistry. Combust. Theory Model. 18(3), 361–387 (2014). https://doi.org/10.1080/13647830.2014.919410
Article MathSciNet Google Scholar
Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D.: Phase asynchronous AMR execution for productive and performant astrophysical flows. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 880–893 (2018)
Google Scholar
Farooqi, M.N., Unat, D., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J.: Nonintrusive AMR asynchrony for communication optimization. In: Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28 - September 1, 2017, Proceedings, pp. 682–694 (2017)
Google Scholar
Fryxell, B., et al.: Flash: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)
Article Google Scholar
Goodale, T., et al.: The cactus framework and toolkit: design and applications. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds.) VECPAR 2002. LNCS, vol. 2565, pp. 197–227. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36569-9_13
Chapter Google Scholar
Humphrey, A., Meng, Q., Berzins, M., Harman, T.: Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond. pp. 4:1–4:8. XSEDE 2012 (2012)
Google Scholar
MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)
Article Google Scholar
Meng, Q., Humphrey, A., Berzins, M.: The Uintah Framework: a unified heterogeneous task scheduling and runtime system. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 2441–2448 (2012)
Google Scholar
Nguyen, T., Unat, D., Zhang, W., Almgren, A., Farooqi, N., Shalf, J.: Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 81:1–81:12. IEEE Press, Piscataway (2016)
Google Scholar
NVLink. https://www.nvidia.com/en-us/data-center/nvlink/
O’Shea, B.W., et al.: Introducing Enzo, an AMR Cosmology Application. Adaptive Mesh Refinement - Theory and Applications, pp. 341–349 (2004)
Google Scholar
Unified Memory on Pascal and Volta. http://on-demand.gputechconf.com/gtc/2017/presentation/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf
PCIe. https://pcisig.com/specifications/pciexpress/
Schive, H.Y., Tsai, Y.C., Chiueh, T.: Gamer: A graphic processing unit accelerated adaptive-mesh-refinement code for astrophysics. Astrophys. J. Suppl. Ser. 186(2), 457–484 (2010)
Article Google Scholar
Top500. https://top500.org
Unified memory. https://devblogs.nvidia.com/unified-memory-cuda-beginners/
Unat, D., et al.: Tida: high-level programming abstractions for data locality management. In: High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19–23, 2016, Proceedings, pp. 116–135 (2016)
Google Scholar
Unified Virtual Addressing. https://devblogs.nvidia.com/unified-memory-in-cuda-6/
Wahib, M., Maruayama, N.: Data-centric GPU-based adaptive mesh refinement. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 3:1–3:7 (2015)
Google Scholar
Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 53:1–53:12. IEEE Press, Piscataway (2016)
Google Scholar
Zhang, W., Almgren, A., Day, M., Nguyen, T., Shalf, J., Unat, D.: Boxlib with tiling: an adaptive mesh refinement software framework. SIAM J. Sci. Comput. 38(5), S156–S172 (2016). https://doi.org/10.1137/15M102616X
Article MathSciNet Google Scholar
Zingale, M., Almgren, A.S., Bell, J.B., Malone, C.M., Nonaka, A.: Astrophysical applications of the maestro code. J. Phys. Conf. Ser. 125(1), 012013 (2008). http://stacks.iop.org/1742-6596/125/i=1/a=012013
Article Google Scholar

Download references

Acknowledgements

This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87.

Author information

Authors and Affiliations

Koç University, Istanbul, Turkey
Muhammad Nufail Farooqi & Didem Unat
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Tan Nguyen, Weiqun Zhang, Ann S. Almgren & John Shalf

Authors

Muhammad Nufail Farooqi
View author publications
You can also search for this author in PubMed Google Scholar
Tan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Weiqun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ann S. Almgren
View author publications
You can also search for this author in PubMed Google Scholar
John Shalf
View author publications
You can also search for this author in PubMed Google Scholar
Didem Unat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Nufail Farooqi .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D. (2019). Asynchronous AMR on Multi-GPUs. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_11
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics