HPC node performance and energy modeling with the co-location of applications

Dauwe, Daniel; Jonardi, Eric; Friese, Ryan D.; Pasricha, Sudeep; Maciejewski, Anthony A.; Bader, David A.; Siegel, Howard Jay

doi:10.1007/s11227-016-1783-y

HPC node performance and energy modeling with the co-location of applications

Published: 24 June 2016

Volume 72, pages 4771–4809, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Daniel Dauwe¹,
Eric Jonardi¹,
Ryan D. Friese¹,
Sudeep Pasricha^1,2,
Anthony A. Maciejewski¹,
David A. Bader³ &
…
Howard Jay Siegel^1,2

507 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

Multicore processors have become an integral part of modern large-scale and high-performance parallel and distributed computing systems. Unfortunately, applications co-located on multicore processors can suffer from decreased performance and increased dynamic energy use as a result of interference in shared resources, such as memory. As this interference is difficult to characterize, assumptions about application execution time and energy usage can be misleading in the presence of co-location. Consequently, it is important to accurately characterize the performance and energy usage of applications that execute in a co-located manner on these architectures. This work investigates some of the disadvantages of co-location, and presents a methodology for building models capable of utilizing varying amounts of information about a target application and its co-located applications to make predictions about the target application’s execution time and the system’s energy use under arbitrary co-locations of a wide range of application types. The proposed methodology is validated on three different server class Intel Xeon multicore processors using eleven applications from two scientific benchmark suites. The model’s utility for scheduling is also demonstrated in a simulated large-scale high-performance computing environment through the creation of a co-location aware scheduling heuristic. This heuristic demonstrates that scheduling using information generated with the proposed modeling methodology is capable of making significant improvements over a scheduling heuristic that is oblivious to co-location interference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Runtime and energy constrained work scheduling for heterogeneous systems

Article 16 May 2022

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

References

Verma A, Ahuja P, Neogi A (2008) Power-aware dynamic placement of HPC applications. In: 22nd Annual International Conference on Supercomputing (ICS ’08), pp 175–184
Zhu Q, Zhu J, Agrawal G (2010) Power-aware consolidation of scientific workflows in virtualized environments. In: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’10), pp 1–12
Tang L, Mars J, Vachharajani N, Hundt R, Soffa M (2011) The impact of memory subsystem resource sharing on datacenter applications. In: 38th Annual International Symposium on Computer Architecture (ISCA ’11), pp 283–294
Sandberg A, Sembrant A, Hagersten E, Black-Schaffer D (2013) Modeling performance variation due to cache sharing. In: IEEE 19th International Symposium on High Performance Computer Architecture (HPCA ’13), pp 155–166
Choi J, Dukhan M, Liu X, Vuduc R (2014) Algorithmic time, energy, and power on candidate HPC compute building blocks. In: IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS ’14), pp 447–457
Dauwe D, Friese R, Pasricha S, Maciejewski AA, Koenig GA, Siegel HJ (2014) Modeling the effects on power and performance from memory interference of co-located applications in multicore systems. In: The 2014 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA ’14), pp 3–9
Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: 48th International Symposium on Microarchitecture (MICRO-48 ’15), pp 62–75
Merkel A, Stoess J, Bellosa F (2010) Resource-conscious scheduling for energy efficiency on multicore processors. In: 5th European Conference on Computer Systems (EuroSys ’10), pp 153–166
Luque C, Moreto M, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Valero M (2012) CPU accounting for multicore processors. IEEE Trans Comput 61(2):251–264
Article MathSciNet Google Scholar
Mars J, Tang, L, Hundt R, Skadron K, Soffa M (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: IEEE/ACM 44th International Symposium on Microarchitecture (MICRO ’11), pp 248–259
Dwyer T, Fedorova A, Blagodurov S, Roth M, Gaud F, Pei J (2013) A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In: ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), pp 83:1–83:11
Cazorla FJ, Ramirez A, Valero M, Fernandez E (2004) Dynamically controlled resource allocation in SMT processors. In: 37th International Symposium on Microarchitecture (MICRO-37 ’04), pp 171–182
De Vuyst M, Kumar R, Tullsen DM (2006) Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: IEEE 20th International Parallel and Distributed Processing Symposium (IPDPS ’06), pp 10–20
Feliu J, Sahuquillo J, Petit S, Duato J (2015) Addressing fairness in SMT multicores with a progress-aware scheduler. In: IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS ’15), pp 187–196
Young BD, Apodaca J, Briceño LD, Smith J, Pasricha S, Maciejewski AA, Siegel HJ, Khemka B, Bahirat S, Ramirez A, Zou Y (2013) Deadline and energy constrained dynamic resource allocation in a heterogeneous computing environment. J Supercomput 63(2):326–347
Article Google Scholar
Al-Qawasmeh AM, Pasricha S, Maciejewski AA, Siegel HJ (2015) Power and thermal-aware workload allocation in heterogeneous data centers. IEEE Trans Comput 64(2):477–491
Article MathSciNet Google Scholar
Khemka B, Friese R, Pasricha S, Maciejewski AA, Siegel HJ, Koenig GA, Powers S, Hilton M, Rambharos R, Poole S (2015) Utility maximizing dynamic resource management in an oversubscribed energy-constrained heterogeneous computing system. Sustain Comput Inf Syst 5:14–30
Google Scholar
Oxley M, Pasricha S, Maciejewski AA, Siegel HJ, Apodaca J, Young D, Briceño L, Smith J, Bahirat S, Khemka B, Ramirez A, Zou Y (2015) Makespan and energy robust stochastic static resource allocation of bags-of-tasks to a heterogeneous computing system. IEEE Trans Parallel Distrib Syst 2791–2805
Talby D, Feitelson DG (1999) Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: 13th International Parallel Processing Symposium (IPPS ’99), pp 513–517
Sadhasivam S, Nagaveni N, Jayarani R, Ram RV (2009) Design and implementation of an efficient two-level scheduler for cloud computing environment. In: International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom ’09), pp 884–886
Utrera G, Corbalan J, Labarta J (2014) Scheduling parallel jobs on multicore clusters using CPU oversubscription. J Supercomput 68(3):1113–1140
Article Google Scholar
Lifka DA (1995) The ANL/IBM SP scheduling system. In: Job scheduling strategies for parallel processing, pp 295–303
Jolliffe I (2002) Principal component analysis. Wiley, Hoboken, NJ
MATH Google Scholar
Chong EK, Zak SH (2013) An introduction to optimization. Wiley, Hoboken, NJ
MATH Google Scholar
LeCun YA, Bottou L, Orr GB, Müller K (2012) “Efficient backprop”, neural networks: tricks of the trade. Springer, New York
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York, NY
MATH Google Scholar
Ubuntu 14 Release Notes. https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes. Accessed Jan 2016
Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 1,2A,2B,2C,3A,3B,3C and 3D, Technical Report 2015. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462. Accessed Jan 2016
Intel Xeon E3-1225v3 Processor http://ark.intel.com/products/75461/. Accessed Jan 2016
Intel Xeon E5649 Processor http://ark.intel.com/products/52581/. Accessed Jan 2016
Intel Xeon E5-2697v2 Processor http://ark.intel.com/products/75283/. Accessed Jan 2016
Performance application programming interface http://icl.cs.utk.edu/papi/. Accessed Jan 2016
HPCToolkit http://hpctoolkit.org/. Accessed Jan 2016
Watts Up? Plug Load Meters https://www.wattsupmeters.com/secure/products.php?pn=0. Accessed Jan 2016
PARSEC Benchmark Suite http://parsec.cs.princeton.edu/. Accessed Jan 2016
NAS Parallel Benchmarks http://www.nas.nasa.gov/publications/npb.html. Accessed Jan 2016
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, New York, NY
MATH Google Scholar
Khemka B, Friese R, Pasricha S, Maciejewski AA, Siegel HJ, Koenig GA, Powers S, Hilton M, Rambharos R, Wright M, Poole S (2015) Comparison of energy-constrained resource allocation heuristics under different task management environments. In: The 2015 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2015), pp 3–12
Khemka B, Friese R, Briceno LD, Siegel HJ, Maciejewski AA, Koenig GA, Groer C, Okonski G, Hilton MM, Rambharos R, Poole S (2015) Utility functions and resource management in an oversubscribed heterogeneous computing environment. IEEE Trans Comput 64(8):2394–2407
Article MathSciNet Google Scholar
Dauwe D, Jonardi E, Friese R, Pasricha S, Maciejewski AA, Bader DA, Siegel HJ (2015) A methodology for co-location aware application performance modeling in multicore computing. In: 17th Workshop on Advances on Parallel and Distributed Computing Models (APDCM ’15), pp 434–443

Download references

Acknowledgments

The authors thank Mark Oxley for his valuable comments on this research. This work was supported by the National Science Foundation (NSF) under Grant Numbers CNS-0905339, CCF-1252500, CCF-1302693, ACI-1339745, and an NSF Graduate Research Fellowship. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. The authors thank Hewlett Packard (HP) of Fort Collins for providing us some of the machines used for testing. Pacific Northwest National Laboratory is operated by Batelle for the U.S. Department of Energy under contract DE-AC0576RL01830. A preliminary version of portions of this work appeared in [40]. The additions to this work include creating an additional set of models for energy use prediction, validating the execution time and energy use prediction models on an additional multicore processor, and creating and analyzing a co-location aware scheduling heuristic that utilizes prediction models generated by our modeling methodology for making intelligent co-location decisions.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO, USA
Daniel Dauwe, Eric Jonardi, Ryan D. Friese, Sudeep Pasricha, Anthony A. Maciejewski & Howard Jay Siegel
Department of Computer Science, Colorado State University, Fort Collins, CO, USA
Sudeep Pasricha & Howard Jay Siegel
College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
David A. Bader

Authors

Daniel Dauwe
View author publications
You can also search for this author in PubMed Google Scholar
Eric Jonardi
View author publications
You can also search for this author in PubMed Google Scholar
Ryan D. Friese
View author publications
You can also search for this author in PubMed Google Scholar
Sudeep Pasricha
View author publications
You can also search for this author in PubMed Google Scholar
Anthony A. Maciejewski
View author publications
You can also search for this author in PubMed Google Scholar
David A. Bader
View author publications
You can also search for this author in PubMed Google Scholar
Howard Jay Siegel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Dauwe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dauwe, D., Jonardi, E., Friese, R.D. et al. HPC node performance and energy modeling with the co-location of applications. J Supercomput 72, 4771–4809 (2016). https://doi.org/10.1007/s11227-016-1783-y

Download citation

Published: 24 June 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11227-016-1783-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HPC node performance and energy modeling with the co-location of applications

Abstract

Access this article

Similar content being viewed by others

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Runtime and energy constrained work scheduling for heterogeneous systems

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HPC node performance and energy modeling with the co-location of applications

Abstract

Access this article

Similar content being viewed by others

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Runtime and energy constrained work scheduling for heterogeneous systems

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation