research-article

Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications

Authors:
Tong Shu

Southern Illinois University

Southern Illinois University
View Profile

,
Yanfei Guo

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Justin Wozniak

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Xiaoning Ding

New Jersey Institute of Technology

New Jersey Institute of Technology
View Profile

,
Ian Foster

Univ. Chicago

Univ. Chicago
View Profile

,
Tahsin Kurc

Stony Brook University

Stony Brook University
View Profile

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2021Article No.: 28Pages 1–15https://doi.org/10.1145/3458817.3476197

Published:13 November 2021Publication History

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–15

ABSTRACT

In an in-situ workflow, multiple components such as simulation and analysis applications are coupled with streaming data transfers. The multiplicity of possible configurations necessitates an auto-tuner for workflow optimization. Existing auto-tuning approaches are computationally expensive because many configurations must be sampled by running the whole workflow repeatedly in order to train the auto-tuner surrogate model or otherwise explore the configuration space. To reduce these costs, we instead combine the performance models of component applications by exploiting the analytical workflow structure, selectively generating test configurations to measure and guide the training of a machine learning workflow surrogate model. Because the training can focus on well-performing configurations, the resulting surrogate model can achieve high prediction accuracy for good configurations despite training with fewer total configurations. Experiments with real applications demonstrate that our approach can identify significantly better configurations than other approaches for a fixed computer time budget.

Supplemental Material

Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications.mp4.mp4

mp4

137.6 MB

Download

References

ADIOS.2021. https://csmd.ornl.gov/adios.Google Scholar
Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, and Ian T. Foster. 2014. Compiler Techniques for Massively Scalable Implicit Task Parallelism. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 299--310.Google Scholar
Utkarsh Ayachit, et al. 2016. Performance Analysis, Design Considerations, and Applications of Extreme-scale in situ Infrastructures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarDigital Library
Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in High-performance Computing Applications. Proc. IEEE 106, 11 (2018), 2068--2083.Google ScholarCross Ref
Prasanna Balaprakash, Robert B. Gramacy, and Stefan M. Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In IEEE Cluster.Google Scholar
Babak Behzad, Surendra Byna, Prabhat, and Marc Snir. 2019. Optimizing I/O Performance of HPC Applications with Autotuning. ACM Trans. on Parallel Computing (TOPC) 5, 4 (2019), 15:1--15:27.Google Scholar
Alexandra Calotoiu, Marcin Copik, Torsten Hoefler, Marcus Ritter, Sergei Shudler, and Felix Wolf. 2020. ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications. In Spring Software for Exascale Computing. 453--482.Google Scholar
Alexandra Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.Google ScholarDigital Library
Zhen Cao, Vasily Tarasov, Sachin Tiwari, and Erez Zadok. 2018. Towards better understanding of black-box auto-tuning: A comparative analysis for storage systems. In USENIX Annual Technical Conference (ATC). 893--907.Google Scholar
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 785--794.Google ScholarDigital Library
Jaemin Choi, David F. Richards, Laxmikant V. Kale, and Abhinav Bhatele. 2020. End-to-end Performance Modeling of Distributed GPU Applications. In ACM International Conference on Supercomputing (ICS). 30:1--12.Google Scholar
Jai Dayal, Drew Bratcher, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Xuechen Zhang, Hasan Abbasi, Scott Klasky, and Norbert Podhorszki. 2014. Flexpath: Type-based publish/subscribe system for large-scale science analytics. In IEEE/ACM intl. Symp. on Cluster, Cloud, and Internet Computing (CCGrid). 246--255.Google Scholar
Diego Didona, Francesco Quaglia, Paolo Romano, and Ennio Torre. 2015. Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning. In ACM International Conference on Performance Engineering (ICPE). 145--156.Google ScholarDigital Library
Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows. Cluster Computing 15, 2 (2012), 163--181.Google ScholarDigital Library
Mathieu Doucet et al. 2021. Machine learning for neutron scattering at ORNL. Machine Learning: Science and Technology 2, 2 (jan 2021), 023001. Google ScholarCross Ref
Matthieu Dreher and Tom Peterka. 2017. Decaf: Decoupled dataflows for in situ high-performance workflows. Technical Report ANL/MCS-TM-371. ANL.Google ScholarCross Ref
Shaohua Duan, Pradeep Subedi, Philip E. Davis, and Manish Parashar. 2019. Addressing Data Resiliency for Staging Based Scientific Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 87:1--22.Google Scholar
Dmitry Duplyakin, Jed Brown, and Robert Ricci. 2016. Active learning in performance analysis. In IEEE Cluster. Taipei, Taiwan, 182--191.Google Scholar
Ian Foster, Mark Ainsworth, Julie Bessac, Franck Cappello, Jong Choi, Sheng Di, Zichao Di, Ali M Gok, Hanqi Guo, Kevin A Huck, Christopher Kelly, Scott Klasky, Kerstin Kleese van Dam, Xin Liang, Kshitij Mehta, Manish Parashar, Tom Peterka, Line Pouchard, Tong Shu, Ozan Tugluk, Hubertus van Dam, Lipeng Wan, Matthew Wolf, Justin M. Wozniak, Wei Xu, Igor Yakushin, Shinjae Yoo, and Todd Munson. 2021. Online Data Analysis and Reduction: An Important Co-design Motif for Extreme-scale Computers. International Journal of High Performance Computing Applications (IJHPCA) (2021).Google Scholar
Geoffrey Fox, Shantenu Jha, and Lavanya Ramakrishnan. 2015. Streaming and Steering Applications: Requirements and Infrastructure final report.Google Scholar
Yuankun Fu, Feng Li, Fengguang Song, and Zizhong Chen. 2018. Performance Analysis and Optimization of In-situ Integration of Simulation with Data Analysis: Zipping Applications Up. In ACM Intl. Symp. on High-Performance Parallel and Distributed Computing (HPDC). 192--205.Google Scholar
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. 2017. Google Vizier: A service for black-box optimization. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data (KDD). 1487--1496.Google ScholarDigital Library
Heat Transfer. 2019. https://github.com/CODARcode/Example-Heat_Transfer/blob/master/README.adoc.Google Scholar
Kate Keahey and James Ahrens. 2017. Future Online Analysis Platform workshop report.Google Scholar
LAMMPS. 2021. https://lammps.sandia.gov.Google Scholar
Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, and Hank Childs. 2016. Performance modeling of in situ rendering. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarDigital Library
Qing Liu, et al. 2014. Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453--1473.Google ScholarDigital Library
Preeti Malakar, Venkatram Vishwanath, Todd Munson, Christopher Knight, Mark Hereld, Sven Leyffer, and Michael E. Papka. 2015. Optimal Scheduling of In-situ Analysis for Large-scale Scientific Simulations. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). Austin, TX, USA.Google Scholar
Azamat Mametjanov, Prasanna Balaprakash, Chekuri Choudary, Paul D. Hovland, Stefan M. Wild, and Gerald Sabin. 2015. Autotuning FPGA Design Parameters for Performance and Power. In IEEE Intl. Symp. on Field-Programmable Custom Computing Machines. 84--91.Google Scholar
Aniruddha Marathe, Rushil Anirudh, Nikhil Jain, Abhinav Bhatele, Jayaraman Thiagarajan, Bhavya Kailkhura, Jae-Seung Yeom, Barry Rountree, and Todd Gamblin. 2017. Performance modeling under resource constraints using deep transfer learning. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.Google ScholarDigital Library
Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A pattern based algorithmic autotuner for graph processing on GPUs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 201--213.Google ScholarDigital Library
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 831--840.Google ScholarCross Ref
Ari Morcos, Haonan Yu, Michela Paganini, and Yuandong Tian. 2019. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In ACM Intl. Conf. on Neural Information Processing Systems (NeurIPS). 1--11.Google Scholar
Jiandong Mu, Mengdi Wang, Lanbo Li, Jun Yang, Wei Lin, and Wei Zhang. 2020. A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU. In ACM/IEEE Design Automation Conference (DAC). 1--6.Google ScholarCross Ref
William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO). 245--256.Google ScholarCross Ref
Jonathan Ozik, Nicholson T. Collier, Justin M. Wozniak, Charles M. Macal, and Gary An. 2018. Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model with the EMEWS Framework. IEEE Transactions on Computational Social Systems 5, 3 (2018), 884--895.Google ScholarCross Ref
Tom Peterka. 2019. ASCR Workshop on In Situ Data Management report.Google ScholarCross Ref
Mihail Popov, Alexandra Jimborean, and David Black-Schaffer. 2019. Efficient thread/page/parallelism autotuning for NUMA systems. In ACM International Conference on Supercomputing (ICS). 342--353.Google ScholarDigital Library
Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf. 2020. Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 884--895.Google Scholar
Tong Shu. 2017. Performance Optimization and Energy Efficiency of Big-data Computing Workflows. Dissertation. New Jersey Institute of Technology, Newark, NJ, USA. http://archives.njit.edu/vol01/etd/2010s/2017/njit-etd2017-096/njit-etd2017-096.pdf.Google Scholar
Tong Shu, Yanfei Guo, Justin Wozniak, Xiaoning Ding, Ian Foster, and Tahsin Kurc. 2021. POSTER: In-situ Workflow Auto-tuning through Combining Component Models. In Proc. of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Virtual Event, 467--468.Google Scholar
Tong Shu and Chase Q. Wu. 2016. Energy-efficient Mapping of Big Data Workflows under Deadline Constraints. In Proc. of Workshop on Workflows in Support of Large-Scale Science in conjunction with ACM/IEEE Supercomputing Conference. Salt Lake City, UT, USA, 34--43. http://ceur-ws.org/Vol-1800/paper5.pdf.Google Scholar
Tong Shu and Chase Q. Wu. 2017. Energy-efficient Dynamic Scheduling of Deadline-constrained MapReduce Workflows. In Proc. of IEEE eScience. Auckland, New Zealand, 393--402.Google Scholar
Tong Shu and Chase Q. Wu. 2017. Performance Optimization of Hadoop Workflows in Public Clouds through Adaptive Task Partitioning. In Proc. of IEEE INFOCOM. Atlanta, GA, USA, 2349--2357.Google Scholar
Tong Shu and Chase Q. Wu. 2020. Energy-efficient Mapping of Large-scale Workflows under Deadline Constraints in Big Data Computing Systems. Future Generation Computer Systems (FGCS) 110 (2020), 515--530. https://www.sciencedirect.com/science/article/pii/S0167739X17300468.Google ScholarCross Ref
Mohammed Sourouri, Espen Birger Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, and Per Gunnar Kjeldsberg. 2017. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarDigital Library
Rick Stevens, Jeffrey Nichols, and Katherine Yelick. 2020. AI for Science Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science.Google Scholar
Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, and Manish Parashar. 2018. Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google Scholar
Jingwei Sun, Guangzhong Sun, Shiyan Zhan, Jiepeng Zhang, and Yong Chen. 2020. Automated Performance Modeling of HPC Applications Using Machine Learning. IEEE Trans. on Computers (TC) 69, 5 (2020), 749--763.Google ScholarCross Ref
Jayaraman J. Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In ACM International Conference on Supercomputing (ICS). 385--395.Google ScholarDigital Library
Philippe Tillet and David Cox. 2017. Input-aware auto-tuning of compute-bound HPC kernels. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarDigital Library
Venkatram Vishwanath, Mark Hereld, Vitali Morozov, and Michael E. Papka. 2011. Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google Scholar
Voro++. 2021. http://math.lbl.gov/voro++.Google Scholar
Justin M. Wozniak, Philip Davis, Tong Shu, Jonathan Ozik, Nicholas Collier, Ian Foster, Thomas Brettin, and Rick Stevens. 2018. Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration. In Proc. of the 4th Workshop on Machine Learning in HPC Environments in conjunction with ACM/IEEE Supercomputing Conference. Dallas, TX, USA, 114--123.Google ScholarCross Ref
Justin M. Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, and Matthew Wolf. 2019. MPI Jobs within MPI Jobs: a Practical Way of Enabling Task-level Fault-tolerance in HPC Workflows. Future Generation Computer Systems (FGCS) 101 (2019), 576--589.Google ScholarDigital Library
Yufei Xia, Chuanzhe Liu, Yuying, and Nana Liu. 2017. A Boosted Decision Tree Approach using Bayesian Hyper-parameter Optimization for Credit Scoring. Expert Systems with Applications 75 (2017), 225--241.Google ScholarCross Ref
Zhibin Yu, Zhendong Bei, and Xuehai Qian. 2018. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In ACM Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 564--577.Google ScholarDigital Library
Fan Zhang, Tong Jin, Qian Sun, Melissa Romanus, Hoang Bui, Scott Klasky, and Manish Parashar. 2017. In-memory staging and data-centric task placement for coupled scientific simulation workflows. Concurrency and Computation: Practice and Experience 29, 12 (2017), 1--19.Google ScholarCross Ref
Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, and Hongfeng Yu. 2013. FlexIO: I/O middleware for location-flexible scientific data analytics. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 320--331.Google ScholarDigital Library

Index Terms

Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

In-situ workflow auto-tuning through combining component models
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

In-situ parallel workflows couple multiple component applications via streaming data transfer to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the huge space of possible ...
Read More
INSTANT: A Runtime Framework to Orchestrate In-Situ Workflows
Euro-Par 2023: Parallel Processing
Abstract
In-situ workflow is a type of workflow where multiple components execute concurrently with data flowing continuously. The adoption of in-situ workflows not only accelerates mission-critical scientific discoveries but also enables responsive ...
Read More
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin
Copyright © 2021 ACM
© 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Available / v1.1
Author Tags
auto-tuning
bootstrapping
component model combination
in-situ workflow
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 386
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

In-situ workflow auto-tuning through combining component models

INSTANT: A Runtime Framework to Orchestrate In-Situ Workflows

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs