Skip to main content

Asynchronous AMR on Multi-GPUs

  • Conference paper
  • First Online:
  • 5959 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Abstract

Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Almgren, A., Bell, J.B., Lijewski, M., Lukic, Z., Andel, E.V.: Nyx: a massively parallel amr code for computational cosmology. APJ 765, 39 (2013)

    Article  Google Scholar 

  2. Almgren, A.S., et al.: CASTRO: a new compressible astrophysical solver. I. Hydrodynamics and self-gravity. Astrophys. J. 715, 1221–1238 (2010)

    Article  Google Scholar 

  3. AMReX: Block-structured AMR framework. https://ccse.lbl.gov/AMReX/index.html

  4. Ang, J., et al.: In: 2014 Hardware-Software Co-Design for High Performance Computing (2014)

    Google Scholar 

  5. Colella, P., et al.: Chombo software package for AMR applications design document. Technical report, LBNL (2003)

    Google Scholar 

  6. Day, M.S., Bell, J.B.: Numerical simulation of laminar reacting flows with complex chemistry. Combust. Theory Model. 4(4), 535–556 (2000)

    Article  Google Scholar 

  7. Emmett, M., Zhang, W., Bell, J.B.: High-order algorithms for compressible reacting flow with complex chemistry. Combust. Theory Model. 18(3), 361–387 (2014). https://doi.org/10.1080/13647830.2014.919410

    Article  MathSciNet  Google Scholar 

  8. Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D.: Phase asynchronous AMR execution for productive and performant astrophysical flows. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 880–893 (2018)

    Google Scholar 

  9. Farooqi, M.N., Unat, D., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J.: Nonintrusive AMR asynchrony for communication optimization. In: Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28 - September 1, 2017, Proceedings, pp. 682–694 (2017)

    Google Scholar 

  10. Fryxell, B., et al.: Flash: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)

    Article  Google Scholar 

  11. Goodale, T., et al.: The cactus framework and toolkit: design and applications. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds.) VECPAR 2002. LNCS, vol. 2565, pp. 197–227. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36569-9_13

    Chapter  Google Scholar 

  12. Humphrey, A., Meng, Q., Berzins, M., Harman, T.: Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond. pp. 4:1–4:8. XSEDE 2012 (2012)

    Google Scholar 

  13. MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)

    Article  Google Scholar 

  14. Meng, Q., Humphrey, A., Berzins, M.: The Uintah Framework: a unified heterogeneous task scheduling and runtime system. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 2441–2448 (2012)

    Google Scholar 

  15. Nguyen, T., Unat, D., Zhang, W., Almgren, A., Farooqi, N., Shalf, J.: Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 81:1–81:12. IEEE Press, Piscataway (2016)

    Google Scholar 

  16. NVLink. https://www.nvidia.com/en-us/data-center/nvlink/

  17. O’Shea, B.W., et al.: Introducing Enzo, an AMR Cosmology Application. Adaptive Mesh Refinement - Theory and Applications, pp. 341–349 (2004)

    Google Scholar 

  18. Unified Memory on Pascal and Volta. http://on-demand.gputechconf.com/gtc/2017/presentation/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf

  19. PCIe. https://pcisig.com/specifications/pciexpress/

  20. Schive, H.Y., Tsai, Y.C., Chiueh, T.: Gamer: A graphic processing unit accelerated adaptive-mesh-refinement code for astrophysics. Astrophys. J. Suppl. Ser. 186(2), 457–484 (2010)

    Article  Google Scholar 

  21. Top500. https://top500.org

  22. Unified memory. https://devblogs.nvidia.com/unified-memory-cuda-beginners/

  23. Unat, D., et al.: Tida: high-level programming abstractions for data locality management. In: High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19–23, 2016, Proceedings, pp. 116–135 (2016)

    Google Scholar 

  24. Unified Virtual Addressing. https://devblogs.nvidia.com/unified-memory-in-cuda-6/

  25. Wahib, M., Maruayama, N.: Data-centric GPU-based adaptive mesh refinement. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 3:1–3:7 (2015)

    Google Scholar 

  26. Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 53:1–53:12. IEEE Press, Piscataway (2016)

    Google Scholar 

  27. Zhang, W., Almgren, A., Day, M., Nguyen, T., Shalf, J., Unat, D.: Boxlib with tiling: an adaptive mesh refinement software framework. SIAM J. Sci. Comput. 38(5), S156–S172 (2016). https://doi.org/10.1137/15M102616X

    Article  MathSciNet  Google Scholar 

  28. Zingale, M., Almgren, A.S., Bell, J.B., Malone, C.M., Nonaka, A.: Astrophysical applications of the maestro code. J. Phys. Conf. Ser. 125(1), 012013 (2008). http://stacks.iop.org/1742-6596/125/i=1/a=012013

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Nufail Farooqi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Farooqi, M.N., Nguyen, T., Zhang, W., Almgren, A.S., Shalf, J., Unat, D. (2019). Asynchronous AMR on Multi-GPUs. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics