skip to main content
10.1145/3343211.3343214acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Persistent coarrays: integrating MPI storage windows in coarray fortran

Published:11 September 2019Publication History

ABSTRACT

The inherent integration of novel hardware and software components on HPC is expected to considerably aggravate the Mean Time Between Failures (MTBF) on scientific applications, while simultaneously increase the programming complexity of these clusters. In this work, we present the initial steps towards the integration of transparent resilience support inside Coarray Fortran. In particular, we propose persistent coarrays, an extension of OpenCoarrays that integrates MPI storage windows to leverage its transport layer and seamlessly map coarrays to files on storage. Preliminary results indicate that our approach provides clear benefits on representative workloads, while incurring in minimal source code changes.

References

  1. Nilmini Abeyratne, Hsing-Min Chen, Byoungchan Oh, Ronald Dreslinski, Chaitali Chakrabarti, and Trevor Mudge. 2016. Checkpointing Exascale Memory Systems with Existing Memory Technologies. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 18--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Katie Antypas, Nicholas Wright, Nicholas P Cardo, Allison Andrews, and Matthew Cordery. 2014. Cori: A Cray XC Pre-exascale System for NERSC. Cray User Group Proceedings. Cray (2014).Google ScholarGoogle Scholar
  3. Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. O'Reilly. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. 2009. Toward Exascale Resilience. The International Journal of High Performance Computing Applications 23, 4 (2009), 374--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google ScholarGoogle Scholar
  7. Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering 1 (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, et al. 2011. The International Exascale Software Project Roadmap. The International Journal of High-Performance Computing Applications 25, 1 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Piotr Dorożyński, Pawełt Czarnul, Artur Malinowski, Krzysztof Czuryłto, Łukasz Dorau, Maciej Maciejewski, and Pawełt Skowron. 2016. Checkpointing of parallel MPI applications using MPI one-sided API with support for byte-addressable non-volatile RAM. Procedia Computer Science 80 (2016), 30--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.Google ScholarGoogle Scholar
  11. Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, and Damian Rouson. 2014. OpenCoarrays: Open-source Transport Layers supporting Coarray Fortran compilers. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alessandro Fanfarillo, Sudip Kumar Garain, Dinshaw Balsara, and Daniel Nagle. 2019. Resilient Computational Applications using Coarray Fortran. Parallel Comput. 81 (2019), 58--67.Google ScholarGoogle ScholarCross RefCross Ref
  13. Michael Feldman. 2017. Oak Ridge readies Summit supercomputer for 2018 debut. in: Top500.org, http://bit.ly/2ERRFr9. {On-Line}.Google ScholarGoogle Scholar
  14. Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2019. Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory. arXiv preprint arXiv:1904.07162 (2019).Google ScholarGoogle Scholar
  16. William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2014. Using advanced MPI: Modern features of the message-passing interface. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. William Gropp and Ewing Lusk. 2004. Fault tolerance in message passing interface programs. The International Journal of High Performance Computing Applications 18, 3 (2004), 363--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David Henty. 2011. A Parallel Benchmark Suite for Fortran Coarrays. In Parallel Computing. Elsevier, 281--288.Google ScholarGoogle Scholar
  19. Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv preprint arXiv:1903.05714 (2019).Google ScholarGoogle Scholar
  20. Edward Karrels and Ewing Lusk. 1994. Performance analysis of MPI programs. In Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing. 195--200.Google ScholarGoogle Scholar
  21. Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K Bansal, William Constable, Oguz Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrow-shahi, Carey Kloss, Ruby J Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems 30 (NIPS 2017). 1740--1750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John D McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter 19 (1995), 25.Google ScholarGoogle Scholar
  23. MPI Forum. 2015. MPI: A Message-Passing Interface Standard. Vol. 3.1. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. Accessed: 2019-04-21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile storage. Commun. ACM 59, 1 (2015), 56--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, and Shaun De Witt. 2018. SAGE: Percipient Storage for Exascale Data-centric Computing. Parallel Computing (2018).Google ScholarGoogle Scholar
  26. Robert W Numrich and John Reid. 1998. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, Vol. 17. ACM, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 683--692.Google ScholarGoogle Scholar
  28. Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Reid. 2018. The new features of Fortran 2018. In ACM SIGPLAN Fortran Forum, Vol. 37. ACM, 5--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. John Reid and Robert W Numrich. 2007. Co-arrays in the next Fortran Standard. Scientific Programming 15, 1 (2007), 9--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, and Stefano Markidis. 2018. MPI Windows on Storage for HPC Applications. Parallel Computing 77 (2018), 38--56.Google ScholarGoogle ScholarCross RefCross Ref
  32. Gabriel Rodriguez, María J. Martín, Patricia González, Juan Touriño, and Ramón Doallo. 2010. CPPC: A Compiler-assisted Tool for Portable Checkpointing of Message-passing Applications. Concurr. Comput.: Pract. Exper. 22, 6 (April 2010), 749--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Schneider. 2018. US supercomputing strikes back. IEEE Spectrum 55, 1 (2018), 52--53.Google ScholarGoogle ScholarCross RefCross Ref
  34. Monika ten Bruggencate and Duncan Roweth. 2010. DMAPP - An API for One-sided Program Models on Baker Systems. In Cray User Group Conference.Google ScholarGoogle Scholar
  35. Rob F Van der Wijngaart and Timothy G Mattson. 2014. The Parallel Research Kernels. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.Google ScholarGoogle Scholar
  36. Sudharshan S Vazhkudai, Bronis R de Supinski, Arthur S Bland, Al Geist, James Sexton, Jim Kahle, Christopher J Zimmer, Scott Atchley, Sarp Oral, Don E Maxwell, et al. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 52. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Persistent coarrays: integrating MPI storage windows in coarray fortran

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EuroMPI '19: Proceedings of the 26th European MPI Users' Group Meeting
        September 2019
        134 pages
        ISBN:9781450371759
        DOI:10.1145/3343211

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 September 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        EuroMPI '19 Paper Acceptance Rate13of26submissions,50%Overall Acceptance Rate66of139submissions,47%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader