ABSTRACT
The inherent integration of novel hardware and software components on HPC is expected to considerably aggravate the Mean Time Between Failures (MTBF) on scientific applications, while simultaneously increase the programming complexity of these clusters. In this work, we present the initial steps towards the integration of transparent resilience support inside Coarray Fortran. In particular, we propose persistent coarrays, an extension of OpenCoarrays that integrates MPI storage windows to leverage its transport layer and seamlessly map coarrays to files on storage. Preliminary results indicate that our approach provides clear benefits on representative workloads, while incurring in minimal source code changes.
- Nilmini Abeyratne, Hsing-Min Chen, Byoungchan Oh, Ronald Dreslinski, Chaitali Chakrabarti, and Trevor Mudge. 2016. Checkpointing Exascale Memory Systems with Existing Memory Technologies. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 18--29. Google ScholarDigital Library
- Katie Antypas, Nicholas Wright, Nicholas P Cardo, Allison Andrews, and Matthew Cordery. 2014. Cori: A Cray XC Pre-exascale System for NERSC. Cray User Group Proceedings. Cray (2014).Google Scholar
- Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. O'Reilly. Google ScholarDigital Library
- Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. 2009. Toward Exascale Resilience. The International Journal of High Performance Computing Applications 23, 4 (2009), 374--388. Google ScholarDigital Library
- Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. ACM, 2. Google ScholarDigital Library
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google Scholar
- Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering 1 (1998). Google ScholarDigital Library
- Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, et al. 2011. The International Exascale Software Project Roadmap. The International Journal of High-Performance Computing Applications 25, 1 (2011). Google ScholarDigital Library
- Piotr Dorożyński, Pawełt Czarnul, Artur Malinowski, Krzysztof Czuryłto, Łukasz Dorau, Maciej Maciejewski, and Pawełt Skowron. 2016. Checkpointing of parallel MPI applications using MPI one-sided API with support for byte-addressable non-volatile RAM. Procedia Computer Science 80 (2016), 30--40. Google ScholarDigital Library
- H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.Google Scholar
- Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, and Damian Rouson. 2014. OpenCoarrays: Open-source Transport Layers supporting Coarray Fortran compilers. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 4. Google ScholarDigital Library
- Alessandro Fanfarillo, Sudip Kumar Garain, Dinshaw Balsara, and Daniel Nagle. 2019. Resilient Computational Applications using Coarray Fortran. Parallel Comput. 81 (2019), 58--67.Google ScholarCross Ref
- Michael Feldman. 2017. Oak Ridge readies Summit supercomputer for 2018 debut. in: Top500.org, http://bit.ly/2ERRFr9. {On-Line}.Google Scholar
- Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91. Google ScholarDigital Library
- Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2019. Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory. arXiv preprint arXiv:1904.07162 (2019).Google Scholar
- William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2014. Using advanced MPI: Modern features of the message-passing interface. MIT Press. Google ScholarDigital Library
- William Gropp and Ewing Lusk. 2004. Fault tolerance in message passing interface programs. The International Journal of High Performance Computing Applications 18, 3 (2004), 363--372. Google ScholarDigital Library
- David Henty. 2011. A Parallel Benchmark Suite for Fortran Coarrays. In Parallel Computing. Elsevier, 281--288.Google Scholar
- Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv preprint arXiv:1903.05714 (2019).Google Scholar
- Edward Karrels and Ewing Lusk. 1994. Performance analysis of MPI programs. In Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing. 195--200.Google Scholar
- Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K Bansal, William Constable, Oguz Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrow-shahi, Carey Kloss, Ruby J Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems 30 (NIPS 2017). 1740--1750. Google ScholarDigital Library
- John D McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter 19 (1995), 25.Google Scholar
- MPI Forum. 2015. MPI: A Message-Passing Interface Standard. Vol. 3.1. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. Accessed: 2019-04-21. Google ScholarDigital Library
- Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile storage. Commun. ACM 59, 1 (2015), 56--63. Google ScholarDigital Library
- Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, and Shaun De Witt. 2018. SAGE: Percipient Storage for Exascale Data-centric Computing. Parallel Computing (2018).Google Scholar
- Robert W Numrich and John Reid. 1998. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, Vol. 17. ACM, 1--31. Google ScholarDigital Library
- Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 683--692.Google Scholar
- Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68. Google ScholarDigital Library
- John Reid. 2018. The new features of Fortran 2018. In ACM SIGPLAN Fortran Forum, Vol. 37. ACM, 5--43. Google ScholarDigital Library
- John Reid and Robert W Numrich. 2007. Co-arrays in the next Fortran Standard. Scientific Programming 15, 1 (2007), 9--26. Google ScholarDigital Library
- Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, and Stefano Markidis. 2018. MPI Windows on Storage for HPC Applications. Parallel Computing 77 (2018), 38--56.Google ScholarCross Ref
- Gabriel Rodriguez, María J. Martín, Patricia González, Juan Touriño, and Ramón Doallo. 2010. CPPC: A Compiler-assisted Tool for Portable Checkpointing of Message-passing Applications. Concurr. Comput.: Pract. Exper. 22, 6 (April 2010), 749--766. Google ScholarDigital Library
- David Schneider. 2018. US supercomputing strikes back. IEEE Spectrum 55, 1 (2018), 52--53.Google ScholarCross Ref
- Monika ten Bruggencate and Duncan Roweth. 2010. DMAPP - An API for One-sided Program Models on Baker Systems. In Cray User Group Conference.Google Scholar
- Rob F Van der Wijngaart and Timothy G Mattson. 2014. The Parallel Research Kernels. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.Google Scholar
- Sudharshan S Vazhkudai, Bronis R de Supinski, Arthur S Bland, Al Geist, James Sexton, Jim Kahle, Christopher J Zimmer, Scott Atchley, Sarp Oral, Don E Maxwell, et al. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 52. Google ScholarDigital Library
Index Terms
- Persistent coarrays: integrating MPI storage windows in coarray fortran
Recommendations
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster ComputingLanguages and libraries based on the Partitioned Global Address Space (PGAS) programming model have emerged in recent years with a focus on addressing the programming challenges for scalable parallel systems. Among these, Coarray Fortran (CAF) is unique ...
A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays
Special Issue on Innovations in Systems for Irregular Applications, Part 2Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...
Fortran 2008 coarrays
Coarrays are a Fortran 2008 standard feature intended for SIMD type parallel programming. The runtime environment starts a number of identical executable images of the coarray program, on multiple processors, which could be actual physical processors or ...
Comments