Abstract
Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Almgren, A., Bell, J.B., Lijewski, M., Lukic, Z., Andel, E.V.: Nyx: a massively parallel amr code for computational cosmology. APJ 765, 39 (2013)
Almgren, A.S., et al.: CASTRO: a new compressible astrophysical solver. I. Hydrodynamics and self-gravity. Astrophys. J. 715, 1221–1238 (2010)
AMReX: Block-structured AMR framework. https://ccse.lbl.gov/AMReX/index.html
Ang, J., et al.: In: 2014 Hardware-Software Co-Design for High Performance Computing (2014)
Colella, P., et al.: Chombo software package for AMR applications design document. Technical report, LBNL (2003)
Day, M.S., Bell, J.B.: Numerical simulation of laminar reacting flows with complex chemistry. Combust. Theory Model. 4(4), 535–556 (2000)
Emmett, M., Zhang, W., Bell, J.B.: High-order algorithms for compressible reacting flow with complex chemistry. Combust. Theory Model. 18(3), 361–387 (2014). https://doi.org/10.1080/13647830.2014.919410
Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D.: Phase asynchronous AMR execution for productive and performant astrophysical flows. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 880–893 (2018)
Farooqi, M.N., Unat, D., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J.: Nonintrusive AMR asynchrony for communication optimization. In: Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28 - September 1, 2017, Proceedings, pp. 682–694 (2017)
Fryxell, B., et al.: Flash: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)
Goodale, T., et al.: The cactus framework and toolkit: design and applications. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds.) VECPAR 2002. LNCS, vol. 2565, pp. 197–227. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36569-9_13
Humphrey, A., Meng, Q., Berzins, M., Harman, T.: Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond. pp. 4:1–4:8. XSEDE 2012 (2012)
MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)
Meng, Q., Humphrey, A., Berzins, M.: The Uintah Framework: a unified heterogeneous task scheduling and runtime system. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 2441–2448 (2012)
Nguyen, T., Unat, D., Zhang, W., Almgren, A., Farooqi, N., Shalf, J.: Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 81:1–81:12. IEEE Press, Piscataway (2016)
O’Shea, B.W., et al.: Introducing Enzo, an AMR Cosmology Application. Adaptive Mesh Refinement - Theory and Applications, pp. 341–349 (2004)
Unified Memory on Pascal and Volta. http://on-demand.gputechconf.com/gtc/2017/presentation/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf
Schive, H.Y., Tsai, Y.C., Chiueh, T.: Gamer: A graphic processing unit accelerated adaptive-mesh-refinement code for astrophysics. Astrophys. J. Suppl. Ser. 186(2), 457–484 (2010)
Top500. https://top500.org
Unified memory. https://devblogs.nvidia.com/unified-memory-cuda-beginners/
Unat, D., et al.: Tida: high-level programming abstractions for data locality management. In: High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19–23, 2016, Proceedings, pp. 116–135 (2016)
Unified Virtual Addressing. https://devblogs.nvidia.com/unified-memory-in-cuda-6/
Wahib, M., Maruayama, N.: Data-centric GPU-based adaptive mesh refinement. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 3:1–3:7 (2015)
Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 53:1–53:12. IEEE Press, Piscataway (2016)
Zhang, W., Almgren, A., Day, M., Nguyen, T., Shalf, J., Unat, D.: Boxlib with tiling: an adaptive mesh refinement software framework. SIAM J. Sci. Comput. 38(5), S156–S172 (2016). https://doi.org/10.1137/15M102616X
Zingale, M., Almgren, A.S., Bell, J.B., Malone, C.M., Nonaka, A.: Astrophysical applications of the maestro code. J. Phys. Conf. Ser. 125(1), 012013 (2008). http://stacks.iop.org/1742-6596/125/i=1/a=012013
Acknowledgements
This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D. (2019). Asynchronous AMR on Multi-GPUs. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)