ABSTRACT
Implementing an in situ workflow involves several challenges related to data placement, task scheduling, efficient communications, scalability, and reliability. Most of the current implementations provide reasonably performant solutions to these issues by focusing on high-performance communications and low-overhead execution models at the cost of reliability and flexibility.
One of the key design choices in such infrastructures is between providing a single-program, integrated environment or a multiple-program, connected environment, both solutions having their own strengths and weaknesses. While these approaches might be appropriate for current production systems, the expected characteristics of exascale machines will shift current priorities.
After a survey of the trade-offs and challenges of integrated and connected in situ workflow solutions available today, we discuss in this paper how exascale systems will impact those designs. In particular, we identify missing features of current system-level software required for the evolution of in situ workflows toward exascale and how system software innovations from the Argo Exascale Computing Project can help address those challenges.
- 2016. The In Situ Terminology Project. (Feb 2016). https://ix.cs.uoregon.edu/~hank/insituterminology/index.cgi?n=Phase1B.Phase1BProposedInSituCategorizations.Google Scholar
- Sean Ahern, Eric Brugger, Brad Whitlock, Jeremy S Meredith, Kathleen Biagas, Mark C Miller, and Hank Childs. 2013. VisIt: Experiences with Sustainable Software. arXiv preprint arXiv:1309.1796 (2013).Google Scholar
- James Ahrens, Berk Geveci, and Charles Law. 2005. ParaView: An End-User Tool for Large-Data Visualization. The Visualization Handbook (2005), 717.Google Scholar
- Scott Atchley, David Dillow, Galen Shipman, Patrick Geoffray, Jeffrey M Squyres, George Bosilca, and Ronald Minnich. 2011. The common communication interface (CCI). In 2011 IEEE 19th Annual Symposium on High Performance Interconnects (HOTI). IEEE, 51--60.Google ScholarDigital Library
- D.A. Boyuka, S. Lakshminarasimham, Xiaocheng Zou, Zhenhuan Gong, J. Jenkins, E.R. Schendel, N. Podhorszki, Qing Liu, S. Klasky, and N.F. Samatova. 2014. Transparent In Situ Data Transformations in ADIOS. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 256--266. https://doi.org/10.1109/CCGrid.2014.73Google Scholar
- J. Dayal, D. Bratcher, G. Eisenhauer, K. Schwan, M. Wolf, Xuechen Zhang, H. Abbasi, S. Klasky, and N. Podhorszki. 2014. Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 246--255. https://doi.org/10.1109/CCGrid.2014.104Google Scholar
- Ciprian Docan, Manish Parashar, and Scott Klasky. 2010. DataSpaces: an Interaction and Coordination Framework for Coupled Simulation Workflows. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10). ACM, New York, NY, USA, 25--36. https://doi.org/10.1145/1851476.1851481 Google ScholarDigital Library
- Ciprian Docan, Manish Parashar, and Scott Klasky. 2010. Enabling High-Speed Asynchronous Data Extraction and Transfer using DART. Concurrency and Computation: Practice and Experience 22 (2010), 1181--1204. Google ScholarDigital Library
- Jack Dongarra, Pete Beckman, et al. 2011. The International Exascale Software Project Roadmap. Int. J. High Perform. Comput. Appl. 25, 1 (Feb. 2011), 58.Google ScholarDigital Library
- Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, and Leigh Orf. 2012. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O. In CLUSTER - IEEE International Conference on Cluster Computing. IEEE. Google ScholarDigital Library
- Matthieu Dorier, Matthieu Dreher, Tom Peterka, Justin M Wozniak, Gabriel Antoniu, and Bruno Raffin. 2015. Lessons Learned from Building in Situ Coupling Frameworks. In Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization. ACM, 19--24. Google ScholarDigital Library
- M. Dreher and T. Peterka. 2017. Decaf: Decoupled Dataflows for In Situ High-Performance Workflows. Technical Report ANL/MCS-TM-371.Google ScholarCross Ref
- Matthieu Dreher and Bruno Raffin. 2014. A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. https://hal.inria.fr/hal-00941413Google ScholarDigital Library
- Greg Eisenhauer, Matthew Wolf, Hasan Abbasi, and Karsten Schwan. [n. d.]. Event-based Systems: Opportunities and Challenges at Exascale. In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems (DEBS '09). Google ScholarDigital Library
- Dan Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Abdelhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman. 2016. Systemwide Power Management with Argo. In High-Performance, Power-Aware Computing (HPPAC). Google ScholarCross Ref
- N. Fabian, K. Moreland, D. Thompson, A.C. Bauer, P. Marion, B. Geveci, M. Rasquin, and K.E. Jansen. 2011. The ParaView Coprocessing Library: A Scalable, General Purpose In Situ Visualization Library. In 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV). 89--96. https://doi.org/10.1109/LDAV.2011.6092322Google Scholar
- Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, Ron Oldfield, Manish Parashar, Nagiza Samatova, Karsten Schwan, Arie Shoshani, Matthew Wolf, Kesheng Wu, and Weikuan Yu. 2014. Hello ADIOS: The Challenges and Lessons of Developing Leadership Class I/O Frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453--1473. https://doi.org/10.1002/cpe.3125 Google ScholarDigital Library
- R. A. Oldfield, P. Widener, A. B. Maccabe, L. Ward, and T. Kordenbrock. 2006. Efficient Data-Movement for Lightweight I/O. In 2006 IEEE International Conference on Cluster Computing. Google ScholarCross Ref
- Swann Perarnau, Rinku Gupta, and Pete Beckman. 2015. Argo: An Exascale Operating System and Runtime. In The International Conference for High Performance Computing, Networking, Storage and Analysis, SC15.Google Scholar
- Swann Perarnau, Rajeev Thakur, Kamil Iskra, Ken Raffenetti, Franck Cappello, Rinku Gupta, Pete Beckman, Marc Snir, Henry Hoffmann, Martin Schulz, and Barry Rountree. 2015. Distributed Monitoring and Management of Exascale Systems in the Argo Project. In IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Short Paper. Google ScholarDigital Library
- S. Perarnau, J. A. Zounmevo, M. Dreher, B. C. V. Essen, R. Gioiosa, K. Iskra, M. B. Gokhale, K. Yoshii, and P. Beckman. 2017. Argo NodeOS: Toward Unified Resource Management for Exascale. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 153--162. https://doi.org/10.1109/IPDPS.2017.25Google Scholar
- Brad Whitlock, Jean M. Favre, and Jeremy S. Meredith. 2011. Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System. In Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization (EGPGV '11). Eurographics Association, 101--109.Google ScholarDigital Library
- Michael Wilde, Mihael Hategan, Justin M. Wozniak, Ben Clifford, Daniel S. Katz, and Ian Foster. 2011. Swift: A Language for Distributed Parallel Scripting. Parallel Comput. 37, 9 (2011). https://doi.org/10.1016/j.parco.2011.05.005Google Scholar
Index Terms
- In Situ Workflows at Exascale: System Software to the Rescue
Recommendations
Challenges on the road to exascale computing
ICS '08: Proceedings of the 22nd annual international conference on SupercomputingSupercomputing systems have made great strides in recent years as the extensive computing needs of cutting-edge engineering work and scientific discovery have driven the development of more powerful systems. The first teraflop computer, ASCI Red, came ...
DYFLOW: A flexible framework for orchestrating scientific workflows on supercomputers
ICPP Workshops '21: 50th International Conference on Parallel Processing WorkshopModern scientific workflows are increasing in complexity with growth in computation power, incorporation of non-traditional computation methods, and advances in technologies enabling data streaming to support on-the-fly computation. These workflows have ...
Preparing HPC Applications for Exascale: Challenges and Recommendations
NBIS '15: Proceedings of the 2015 18th International Conference on Network-Based Information SystemsWhile the HPC community is working towards the development of the first Exaflop computer (expected around 2020), after reaching the Petaflop milestone in 2008 still only few HPC applications are able to fully exploit the capabilities of Petaflop ...
Comments