Abstract
While performance remains a major objective in the field of high-performance computing (HPC), future systems will have to deliver desired performance under both reliability and energy constraints. Although a number of resilience methods and power management techniques have been presented to address the reliability and energy concerns, the trade-offs among performance, power, and resilience are not well understood, especially in HPC systems with unprecedented scale and complexity. In this work, we present a co-modeling mechanism named TOPPER (system-wide Trade-Off modeling for Performance, PowEr, and Resilience). TOPPER is build with colored Petri nets which allow us to capture the dynamic, complicated interactions and dependencies among different factors such as workload characteristics, hardware reliability, runtime system operation, on a petascale machine. Using system traces collected from a production supercomputer, we conducted a series of experiments to analyze various resilience methods, power capping techniques, and job characteristics in terms of system-wide performance and energy consumption. Our results provide interesting insights regarding performance–power–resilience trade-offs on HPC systems.
Similar content being viewed by others
References
Balbo G (2007) Introduction to generalized stochastic petri nets. In: Proceedings of SFM
Bautista-Gomez L, Komatitsch D, Maruyama N, Tsuboi S, Cappello F, Matsuoka S (2011) FTI: high performance fault tolerance interface for hybrid systems. In: Proceedings of SC
Bircher W, John L (2008) Analysis of dynamic power management on multi-core processors. In: Proceedings of ICS
Bodas D, Song J, Rajappa M, Hoffman A (2014) Simple power-aware scheduler to limit power consumption by HPC system within a budget. In: Proceedings of E2SC
Chen X, Xu C, Dick R, Mao Z (2010) Performance and power modeling in a multi-programmed multi-core environment. In: Proceedings of DAC
Chiesi M, Vanzolini L, Mucci C, Scarselli E, Guerrieri R (2015) Power-aware job scheduling on heterogeneous multicore architectures. IEEE Trans Parallel Distrib Syst 26:868–877
Cobalt Resource Manager http://trac.mcs.anl.gov/projects/cobalt
Crovella M, Bianchini R, Leblanc T, Markatos E, Wisniewski R (1992) Using communication-to-computation ratio in parallel program design and performance prediction. In: Proceedings of IPDPS
CPN Tools (2015) http://cpntools.org/
Curtis-Maury M, Dzierwa J, Antonopoulos C, Nikolopoulos D (2006) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of ICS
Daly J (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener Comput Syst 22:303–312
Di S, Bouguerra M-S, Bautista-Gomez LA, Cappello F (2014) Optimization of multi-level checkpoint model for large scale HPC applications. In: Proceedings of IPDPS
Elliott J, Kharbas K, Fiala D, Mueller F, Ferreira K, Engelmann C (2012) Combining partial redundancy and checkpointing for HPC. In: Proceedings of ICDCS
ExSpecT (2015) http://www.exspect.com/
Fan X, Weber W-D, Barroso L (2007) Power provisioning for a warehouse-sized computer. In: Proceedings of ISCA
Feitelson D, Rudolph L, Schwiegelshohn U, Sevcik K, Wong P (1997) Theory and practice in parallel job scheduling. In: Proceedings of JSSPP
Feng X, Ge R, Cameron K (2005) Power and energy profiling of scientific applications on distributed systems. In: Proceedings of IPDPS
Ferreira K, Stearley J, Laros III J, Oldfield R et al (2011) Evaluating the viability of process replication reliability for exascale systems. In: Proceedings of SC
Gandhi A, Harchol-Balter M, Adan I (2010) Server farms with setup costs. Perform Eval 67:1123–1138
Ge R, Feng X, Cameron K (2005) Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of SC
Ge R, Feng X, Song S, Chang H-C, Li D, Cameron K (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21:658–671
Gniady C, Butt A, Hu Y, Lu Y-H (2006) Program counter-based prediction techniques for dynamic power management. IEEE Trans Comput 55:641–658
Goiri I, Kien L, Haque M, Beauchea R, Nguyen T, Guitart J, Torres J, Bianchini R (2011) GreenSlot: scheduling energy consumption in green datacenters. In: Proceedings of SC
Guenter B, Jain N, Williams C (2011) Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In: Proceedings of INFOCOM
Jensen K (1981) Colored petri nets and the invariant-method. Theoret Comput Sci 14:317–336
Kanev S, Hazelwood KM, Wei G-Y, Brooks DM (2014) Tradeoffs between power management and tail latency in warehouse-scale applications. In: Proceedings of IISWC
LeBlanc T, Anand R, Gabriel E, Subhlok J (2009) Volpexmpi: an MPI Library for execution of parallel applications on volatile nodes. In: European PVM/MPI users’ group meeting
Lefurgy C, Wang X, Ware M (2007) Server-level power control. In: Proceedings of ICAC
LLview (2013) Graphical monitoring of loadleveler controlled cluster. http://www.fz-juelich.de/jsc/llview/
Martin T, Siewiorek D (2001) Non-ideal battery and main memory effects on CPU speed-setting for low power. IEEE Trans VLSI System 9:29–34
Marwan W, Rohr C, Heiner M (2012) Petri nets in snoopy: a unifying framework for the graphical display, computational modelling, and simulation of bacterial regulatory networks. Humana Press, New York
Mira (2012) Next-generation supercomputer. https://www.alcf.anl.gov/mira
Moody A, Bronevetsky G, Mohror K, Supinski B (2010) Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: Proceedings of SC
NSF Cyberinfrastructure Framework for \(21^{st}\) Century Science and Engineering Vision. http://www.nsf.gov/pubs/2010/nsf10015/nsf10015.jsp
Patki T, Lowenthal D, Rountree B, Schulz M, de Supinski B (2013) Exploring hardware overprovisioning in power-constrained, high performance computing. In: Proceedings of ICS
Qiu Q, Pedram M (1999) Dynamic power management based on continuous-time Markov decision processes. In: Proceedings of DAC
Reed D, Lu C, Mendes C (2003) Big systems and big reliability challenges. In: Proceedings of ParCo
ReNeW (2015) http://www.renew.de/
Riesen R, Ferreira K, Silva D, Lemarinier P, Arnold D, Bridges P (2012) Alleviating scalability issues of checkpointing protocols. In: Proceedings of SC
Rong P, Pedram M (2006) Battery-aware power management based on Markovian decision processes. In: Proceedings of ICCAD
Sancho J, Petrini F, Davis K, Gioiosa R, Jiang S (2005) Current practice and a direction forward in checkpoint/restart implementations for fault tolerance. In: Proceedings of IPDPS
Srinivasan J, Adve S, Bose P, Rivers J (2004) The impact of technology scaling on lifetime reliability. In: Proceedings of DSN
Tang W, Desai N, Buettner D, Lan Z (2010) Analyzing and adjusting user runtime estimates to improve job scheduling on blue gene/P. In: Proceedings of IPDPS
The Standard Workload Format (2007) http://www.cs.huji.ac.il/labs/parallel/workload/swf.html
Tian Y, Lin C, Yao M (2012) Modeling and analyzing power management policies in server farms using stochastic petri nets. In: Proceedings of e-Energy
Tiwari A, Laurenzano M, Carrington L, Snavely A (2012) Modeling power and energy usage of HPC Kernels. In: Proceedings of IPDPSW
TOPPER (2015) http://bluesky.cs.iit.edu/topper/
Wallace S, Vishwanath V, Coghlan S, Lan Z, Papka M (2013) Application profilling benchmarks on IBM blue gene/Q. In: Proceedings of cluster
Wingstrom J (2009) Overcoming the difficulties created by the volatile nature of desktop grids through understanding. Technical report, Ph.D. thesis, University of Hawai’i, Manoa
Yang X, Zhou Z, Wallace S, Lan Z, Tang W, Coghlan S, Papka M (2013) Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems. In: Proceedings of SC
Yu L, Zhou Z, Wallace S, Papka M, Lan Z (2015) Quantitative modeling of power-performance tradeoffs on extreme scale systems. J Parallel Distrib Comput Comput 84:1–14
Zhou Z, Lan Z, Tang W, Desai N (2013) Reducing energy costs for IBM blue gene/P via power-aware job scheduling. In: Proceedings of JSSPP
Acknowledgements
This work is supported in part by US National Science Foundation Grant CCF-1618776 and CCF-1422009. It used data of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, L., Zhou, Z., Fan, Y. et al. System-wide trade-off modeling of performance, power, and resilience on petascale systems. J Supercomput 74, 3168–3192 (2018). https://doi.org/10.1007/s11227-018-2368-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2368-8