skip to main content
10.1145/3458817.3476197acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications

Published:13 November 2021Publication History

ABSTRACT

In an in-situ workflow, multiple components such as simulation and analysis applications are coupled with streaming data transfers. The multiplicity of possible configurations necessitates an auto-tuner for workflow optimization. Existing auto-tuning approaches are computationally expensive because many configurations must be sampled by running the whole workflow repeatedly in order to train the auto-tuner surrogate model or otherwise explore the configuration space. To reduce these costs, we instead combine the performance models of component applications by exploiting the analytical workflow structure, selectively generating test configurations to measure and guide the training of a machine learning workflow surrogate model. Because the training can focus on well-performing configurations, the resulting surrogate model can achieve high prediction accuracy for good configurations despite training with fewer total configurations. Experiments with real applications demonstrate that our approach can identify significantly better configurations than other approaches for a fixed computer time budget.

Skip Supplemental Material Section

Supplemental Material

Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications.mp4.mp4

mp4

137.6 MB

References

  1. ADIOS.2021. https://csmd.ornl.gov/adios.Google ScholarGoogle Scholar
  2. Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, and Ian T. Foster. 2014. Compiler Techniques for Massively Scalable Implicit Task Parallelism. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 299--310.Google ScholarGoogle Scholar
  3. Utkarsh Ayachit, et al. 2016. Performance Analysis, Design Considerations, and Applications of Extreme-scale in situ Infrastructures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in High-performance Computing Applications. Proc. IEEE 106, 11 (2018), 2068--2083.Google ScholarGoogle ScholarCross RefCross Ref
  5. Prasanna Balaprakash, Robert B. Gramacy, and Stefan M. Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In IEEE Cluster.Google ScholarGoogle Scholar
  6. Babak Behzad, Surendra Byna, Prabhat, and Marc Snir. 2019. Optimizing I/O Performance of HPC Applications with Autotuning. ACM Trans. on Parallel Computing (TOPC) 5, 4 (2019), 15:1--15:27.Google ScholarGoogle Scholar
  7. Alexandra Calotoiu, Marcin Copik, Torsten Hoefler, Marcus Ritter, Sergei Shudler, and Felix Wolf. 2020. ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications. In Spring Software for Exascale Computing. 453--482.Google ScholarGoogle Scholar
  8. Alexandra Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhen Cao, Vasily Tarasov, Sachin Tiwari, and Erez Zadok. 2018. Towards better understanding of black-box auto-tuning: A comparative analysis for storage systems. In USENIX Annual Technical Conference (ATC). 893--907.Google ScholarGoogle Scholar
  10. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 785--794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jaemin Choi, David F. Richards, Laxmikant V. Kale, and Abhinav Bhatele. 2020. End-to-end Performance Modeling of Distributed GPU Applications. In ACM International Conference on Supercomputing (ICS). 30:1--12.Google ScholarGoogle Scholar
  12. Jai Dayal, Drew Bratcher, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Xuechen Zhang, Hasan Abbasi, Scott Klasky, and Norbert Podhorszki. 2014. Flexpath: Type-based publish/subscribe system for large-scale science analytics. In IEEE/ACM intl. Symp. on Cluster, Cloud, and Internet Computing (CCGrid). 246--255.Google ScholarGoogle Scholar
  13. Diego Didona, Francesco Quaglia, Paolo Romano, and Ennio Torre. 2015. Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning. In ACM International Conference on Performance Engineering (ICPE). 145--156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows. Cluster Computing 15, 2 (2012), 163--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mathieu Doucet et al. 2021. Machine learning for neutron scattering at ORNL. Machine Learning: Science and Technology 2, 2 (jan 2021), 023001. Google ScholarGoogle ScholarCross RefCross Ref
  16. Matthieu Dreher and Tom Peterka. 2017. Decaf: Decoupled dataflows for in situ high-performance workflows. Technical Report ANL/MCS-TM-371. ANL.Google ScholarGoogle ScholarCross RefCross Ref
  17. Shaohua Duan, Pradeep Subedi, Philip E. Davis, and Manish Parashar. 2019. Addressing Data Resiliency for Staging Based Scientific Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 87:1--22.Google ScholarGoogle Scholar
  18. Dmitry Duplyakin, Jed Brown, and Robert Ricci. 2016. Active learning in performance analysis. In IEEE Cluster. Taipei, Taiwan, 182--191.Google ScholarGoogle Scholar
  19. Ian Foster, Mark Ainsworth, Julie Bessac, Franck Cappello, Jong Choi, Sheng Di, Zichao Di, Ali M Gok, Hanqi Guo, Kevin A Huck, Christopher Kelly, Scott Klasky, Kerstin Kleese van Dam, Xin Liang, Kshitij Mehta, Manish Parashar, Tom Peterka, Line Pouchard, Tong Shu, Ozan Tugluk, Hubertus van Dam, Lipeng Wan, Matthew Wolf, Justin M. Wozniak, Wei Xu, Igor Yakushin, Shinjae Yoo, and Todd Munson. 2021. Online Data Analysis and Reduction: An Important Co-design Motif for Extreme-scale Computers. International Journal of High Performance Computing Applications (IJHPCA) (2021).Google ScholarGoogle Scholar
  20. Geoffrey Fox, Shantenu Jha, and Lavanya Ramakrishnan. 2015. Streaming and Steering Applications: Requirements and Infrastructure final report.Google ScholarGoogle Scholar
  21. Yuankun Fu, Feng Li, Fengguang Song, and Zizhong Chen. 2018. Performance Analysis and Optimization of In-situ Integration of Simulation with Data Analysis: Zipping Applications Up. In ACM Intl. Symp. on High-Performance Parallel and Distributed Computing (HPDC). 192--205.Google ScholarGoogle Scholar
  22. Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. 2017. Google Vizier: A service for black-box optimization. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data (KDD). 1487--1496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Heat Transfer. 2019. https://github.com/CODARcode/Example-Heat_Transfer/blob/master/README.adoc.Google ScholarGoogle Scholar
  24. Kate Keahey and James Ahrens. 2017. Future Online Analysis Platform workshop report.Google ScholarGoogle Scholar
  25. LAMMPS. 2021. https://lammps.sandia.gov.Google ScholarGoogle Scholar
  26. Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, and Hank Childs. 2016. Performance modeling of in situ rendering. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Qing Liu, et al. 2014. Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453--1473.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Preeti Malakar, Venkatram Vishwanath, Todd Munson, Christopher Knight, Mark Hereld, Sven Leyffer, and Michael E. Papka. 2015. Optimal Scheduling of In-situ Analysis for Large-scale Scientific Simulations. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). Austin, TX, USA.Google ScholarGoogle Scholar
  29. Azamat Mametjanov, Prasanna Balaprakash, Chekuri Choudary, Paul D. Hovland, Stefan M. Wild, and Gerald Sabin. 2015. Autotuning FPGA Design Parameters for Performance and Power. In IEEE Intl. Symp. on Field-Programmable Custom Computing Machines. 84--91.Google ScholarGoogle Scholar
  30. Aniruddha Marathe, Rushil Anirudh, Nikhil Jain, Abhinav Bhatele, Jayaraman Thiagarajan, Bhavya Kailkhura, Jae-Seung Yeom, Barry Rountree, and Todd Gamblin. 2017. Performance modeling under resource constraints using deep transfer learning. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A pattern based algorithmic autotuner for graph processing on GPUs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 201--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 831--840.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ari Morcos, Haonan Yu, Michela Paganini, and Yuandong Tian. 2019. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In ACM Intl. Conf. on Neural Information Processing Systems (NeurIPS). 1--11.Google ScholarGoogle Scholar
  34. Jiandong Mu, Mengdi Wang, Lanbo Li, Jun Yang, Wei Lin, and Wei Zhang. 2020. A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU. In ACM/IEEE Design Automation Conference (DAC). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  35. William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO). 245--256.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jonathan Ozik, Nicholson T. Collier, Justin M. Wozniak, Charles M. Macal, and Gary An. 2018. Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model with the EMEWS Framework. IEEE Transactions on Computational Social Systems 5, 3 (2018), 884--895.Google ScholarGoogle ScholarCross RefCross Ref
  37. Tom Peterka. 2019. ASCR Workshop on In Situ Data Management report.Google ScholarGoogle ScholarCross RefCross Ref
  38. Mihail Popov, Alexandra Jimborean, and David Black-Schaffer. 2019. Efficient thread/page/parallelism autotuning for NUMA systems. In ACM International Conference on Supercomputing (ICS). 342--353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf. 2020. Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 884--895.Google ScholarGoogle Scholar
  40. Tong Shu. 2017. Performance Optimization and Energy Efficiency of Big-data Computing Workflows. Dissertation. New Jersey Institute of Technology, Newark, NJ, USA. http://archives.njit.edu/vol01/etd/2010s/2017/njit-etd2017-096/njit-etd2017-096.pdf.Google ScholarGoogle Scholar
  41. Tong Shu, Yanfei Guo, Justin Wozniak, Xiaoning Ding, Ian Foster, and Tahsin Kurc. 2021. POSTER: In-situ Workflow Auto-tuning through Combining Component Models. In Proc. of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Virtual Event, 467--468.Google ScholarGoogle Scholar
  42. Tong Shu and Chase Q. Wu. 2016. Energy-efficient Mapping of Big Data Workflows under Deadline Constraints. In Proc. of Workshop on Workflows in Support of Large-Scale Science in conjunction with ACM/IEEE Supercomputing Conference. Salt Lake City, UT, USA, 34--43. http://ceur-ws.org/Vol-1800/paper5.pdf.Google ScholarGoogle Scholar
  43. Tong Shu and Chase Q. Wu. 2017. Energy-efficient Dynamic Scheduling of Deadline-constrained MapReduce Workflows. In Proc. of IEEE eScience. Auckland, New Zealand, 393--402.Google ScholarGoogle Scholar
  44. Tong Shu and Chase Q. Wu. 2017. Performance Optimization of Hadoop Workflows in Public Clouds through Adaptive Task Partitioning. In Proc. of IEEE INFOCOM. Atlanta, GA, USA, 2349--2357.Google ScholarGoogle Scholar
  45. Tong Shu and Chase Q. Wu. 2020. Energy-efficient Mapping of Large-scale Workflows under Deadline Constraints in Big Data Computing Systems. Future Generation Computer Systems (FGCS) 110 (2020), 515--530. https://www.sciencedirect.com/science/article/pii/S0167739X17300468.Google ScholarGoogle ScholarCross RefCross Ref
  46. Mohammed Sourouri, Espen Birger Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, and Per Gunnar Kjeldsberg. 2017. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Rick Stevens, Jeffrey Nichols, and Katherine Yelick. 2020. AI for Science Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science.Google ScholarGoogle Scholar
  48. Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, and Manish Parashar. 2018. Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle Scholar
  49. Jingwei Sun, Guangzhong Sun, Shiyan Zhan, Jiepeng Zhang, and Yong Chen. 2020. Automated Performance Modeling of HPC Applications Using Machine Learning. IEEE Trans. on Computers (TC) 69, 5 (2020), 749--763.Google ScholarGoogle ScholarCross RefCross Ref
  50. Jayaraman J. Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In ACM International Conference on Supercomputing (ICS). 385--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Philippe Tillet and David Cox. 2017. Input-aware auto-tuning of compute-bound HPC kernels. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Venkatram Vishwanath, Mark Hereld, Vitali Morozov, and Michael E. Papka. 2011. Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).Google ScholarGoogle Scholar
  53. Voro++. 2021. http://math.lbl.gov/voro++.Google ScholarGoogle Scholar
  54. Justin M. Wozniak, Philip Davis, Tong Shu, Jonathan Ozik, Nicholas Collier, Ian Foster, Thomas Brettin, and Rick Stevens. 2018. Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration. In Proc. of the 4th Workshop on Machine Learning in HPC Environments in conjunction with ACM/IEEE Supercomputing Conference. Dallas, TX, USA, 114--123.Google ScholarGoogle ScholarCross RefCross Ref
  55. Justin M. Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, and Matthew Wolf. 2019. MPI Jobs within MPI Jobs: a Practical Way of Enabling Task-level Fault-tolerance in HPC Workflows. Future Generation Computer Systems (FGCS) 101 (2019), 576--589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yufei Xia, Chuanzhe Liu, Yuying, and Nana Liu. 2017. A Boosted Decision Tree Approach using Bayesian Hyper-parameter Optimization for Credit Scoring. Expert Systems with Applications 75 (2017), 225--241.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhibin Yu, Zhendong Bei, and Xuehai Qian. 2018. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In ACM Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 564--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Fan Zhang, Tong Jin, Qian Sun, Melissa Romanus, Hoang Bui, Scott Klasky, and Manish Parashar. 2017. In-memory staging and data-centric task placement for coupled scientific simulation workflows. Concurrency and Computation: Practice and Experience 29, 12 (2017), 1--19.Google ScholarGoogle ScholarCross RefCross Ref
  59. Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, and Hongfeng Yu. 2013. FlexIO: I/O middleware for location-flexible scientific data analytics. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 320--331.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
      November 2021
      1493 pages
      ISBN:9781450384421
      DOI:10.1145/3458817

      Copyright © 2021 ACM

      © 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 November 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,516of6,373submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader