ABSTRACT
Like most modern computer systems, High Performance Computing (HPC) machines integrate many highly configurable hardware devices and software components. Finding their optimal parametrization is a complex task, as the size of the parametric space and the non-linear behavior of HPC systems make hand tuning, theoretical modeling or exhaustive sampling unsuitable for most cases. Auto-tuning methods relying on black-box optimization have emerged as a promising solution for finding systems' best parametrization without making any assumption on their behaviors. In this paper, we present the architecture of an auto-tuning framework, called Smart HPC Application MANager (SHAMan), that integrates black-box optimization heuristics to find the optimal parametrization of an Input/Output (I/O) accelerator for a HPC application. We describe the conceptual and technical architecture of the framework and its native support for HPC clusters' ecosystem. We detail in depth the stand-alone optimization engine and its integration as a service provided by a Web application. We deployed and tested the framework by tuning an I/O accelerator developed by the Atos company on a HPC cluster running in production. The tuner's performance is evaluated by optimizing 90 different I/O oriented applications. We show a median improvement of 29% in speed-up compared to the default parametrization and this improvement goes up to 98% for a certain class of applications.
- [n.d.]. Flask, a lightweight WSGI web application framework. https://www.palletsprojects.com/p/flask/. Online; accessed: 2020-02-06.Google Scholar
- [n.d.]. MongoDB, the most popular database for modern apps. https://www.mongodb.com/. Online; accessed: 2020-02-06.Google Scholar
- Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv: cs.LG/1907.10902Google Scholar
- Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Association for Computing Machinery, New York, NY, USA, 303--316. https://doi.org/10.1145/2628071.2628092 Google ScholarDigital Library
- P. Balaprakash, J. Dongarra, T. Gamblin, M. Hall, J. K. Hollingsworth, B. Norris, and R. Vuduc. 2018. Autotuning in High-Performance Computing Applications. Proc. IEEE 106, 11 (2018), 2068--2083.Google ScholarCross Ref
- B. Behzad, S. Byna, M. Prabhat, and M. Snir. 2015. Pattern-driven parallel I/O tuning. In Proceedings of the 10th Parallel Data Storage Workshop. 43--48. Google ScholarDigital Library
- B. Behzad, H. V. T. Luu, J. Huchette, S. Byna, Prabhat, R. Aydt, Q. Koziol, and M. Snir. 2013. Taming Parallel I/O Complexity with Auto-tuning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). Article 68, 12 pages. Google ScholarDigital Library
- Z. Cao. 2018. A Practical, Real-Time Auto-Tuning Framework for Storage Systems.Google Scholar
- Z. Cao, V. Tarasov, S. Tiwari, and E. Zadok. 2018. Towards Better Understanding of Black-box Auto-tuning: A Comparative Analysis for Storage Systems. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '18). 893--907. Google ScholarDigital Library
- L.Davis. 1991. Handbook of Genetic Algorithms. Van Nostrand Reinhold.Google Scholar
- K. T. Fang, R. Li, and A. Sudjianto. 2005. Design and Modeling for Computer Experiments (Computer Science & Data Analysis). Chapman & Hall/CRC. Google ScholarDigital Library
- Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Elliot Karro, and D. Sculley (Eds.). 2017. Google Vizier: A Service for Black-Box Optimization. http://www.kdd.org/kdd2017/papers/view/google-vizier-a-service-for-black-box-optimizationGoogle Scholar
- M. Jette, A. Yoo, and M. Grondona. 2003. SLURM: Simple linux utility for resource management. Lecture notes in computer science.Google Scholar
- R. Li K-T. Fang and A. Sudjianto. 2005. Design and Modleing for Computer Experiments. Chapman and Hall/CRC.Google Scholar
- Patrick Koch, Oleg Golovidov, Steven Gardner, Brett Wujek, Joshua Griffin, and Yan Xu. 2018. Autotune. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (Jul 2018). https://doi.org/10.1145/3219819.3219837Google ScholarDigital Library
- Y. Li, K. Chang, O. Bel, E. L. Miller, and D. D. E Long. 2017. CAPES: Unsupervised Storage Performance Tuning Using Neural Network-based Deep Reinforcement Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). New York, NY, USA, Article 42, 14 pages. Google ScholarDigital Library
- T. Miyazaki, I. Sato, and N. Shimizu. 2018. Bayesian Optimization of HPC Systems for Energy Efficiency. In High Performance Computing. Springer International Publishing, Cham, 44--62.Google Scholar
- S. Robert, S. Zertal, and G. Goret. 2019. Auto-tuning of IO accelerators using black-box optimization. In Proceedings of the International Conference on High Performance Computing Simulation (HPCS).Google Scholar
- C. D. Gelatt S. Kirkpatrick and M. P. Vecchi. 1983. Optimization by Simulated Annealing. Vol. 220. Science.Google Scholar
- Y. Hamadi V. K. Ky, C. D'Ambrosio and L. Liberti. 2016. Surrogate-based methods for black-box optimization. International Transactions in Operational Research 24 (2016).Google Scholar
Index Terms
- SHAMan: an intelligent framework for HPC auto-tuning of I/O accelerators
Recommendations
Taming parallel I/O complexity with auto-tuning
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisWe present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to ...
Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning
Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on ...
SHAMan: A Flexible Framework for Auto-tuning HPC Systems
Modelling, Analysis, and Simulation of Computer and Telecommunication SystemsAbstractModern computer components, both hardware and software, come with many tunable parameters and their parametrization can have a strong impact on their performance. Auto-tuning methods relying on black-box optimization have delivered good results ...
Comments