Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Evans, Jeffrey J.; Lucas, Charles E.

doi:10.1007/s10586-011-0193-4

Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Published: 17 December 2011

Volume 16, pages 91–115, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Jeffrey J. Evans¹ &
Charles E. Lucas²

309 Accesses
3 Citations
Explore all metrics

Abstract

Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.

We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.

To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications

A Three Step Blind Approach for Improving HPC Systems’ Energy Performance

Time and Energy Performance of Parallel Systems with Hierarchical Memory

Article Open access 09 September 2015

References

Argonne National Laboratory: Using the hydra process manager. Web URL (2011). http://wiki.mcs.anl.gov/mpich2/
Baik, S., Hood, C., Gropp, W.: Prototype of AM3: active mapper and monitoring module for the myrinet environment. In: Proceedings of the HSLN Workshop (2002)
Google Scholar
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineburg, S., Fredrickson, P., Lasinksi, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center (1994)
Baydal, E., Lopez, P., Duato, J.: A congestion control mechanism for wormhole networks. In: Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, pp. 19–26 (2001)
Chapter Google Scholar
Bode, B., Halstead, D., Kendall, R., Lei, Z.: The portable batch scheduler and the Maui scheduler on Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)
Google Scholar
Bollinger, J., Gross, T.: A framework-based approach to the development of network-aware applications. IEEE Trans. Softw. Eng. 24(5), 376–390 (1998)
Article Google Scholar
Chakravarthi, S., Pillai, A., Padmanabhan, J., Apte, M., Skjellum, A.: A fine-grain synchronization mechanism for QoS based communication on Myrinet. In: International Conference on Distributed Computing, 2001 (2001, submitted)
Coll, S., Flich, J., Malumbres, M., Lopez, P., Duato, J., Mora, F.: A first implementation of in-transit buffers on Myrinet gm software. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium, pp. 1640–1647 (2001)
Google Scholar
Dally, W.J., Seitz, C.L.: Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. C-36, 547–553 (1987)
Article Google Scholar
Du, X., Dong, Y., Zhang, X.: Characterizing communication interactions of parallel and sequential jobs on networks of workstations. In: Proceedings of the IEEE Annual International Conference on Communications, pp. 1133–1137 (1997)
Google Scholar
D.A.R. (Editor): The roadmap for the revitalization of high-end computing. Tech. rep., Computing Research Association (2003). http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach/20040112_cra_hecrtf_report.pdf
Evans, J.J.: Modeling parallel application sensitivity to network performance. Ph.D. thesis, Illinois Institute of Technology (2005)
Evans, J.J., Baik, S., Kroculick, J., Hood, C.S.: Network adaptability in clusters and grids. In: Proceedings from the Conference on Advances in Internet Technologies and Applications (CAITA), CDROM. IPSI (2004)
Evans, J.J., Hood, C.S.: Network performance variability in NOW clusters. In: Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid (CCGrid05) (CDROM) (2005)
Google Scholar
Evans, J.J., Hood, C.S.: PARSE: a tool for parallel application run time sensitivity evaluation. In: Proceedings of the Twelfth International Conference on Parallel and Distributed Systems (ICPADS), pp. 475–484 (2006)
Google Scholar
Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. In: Proceedings of the Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) (Best Paper), pp. 920–932 (2007)
Chapter Google Scholar
Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. Int. J. High. Perform. Comput. Networking 7(1), 8–18 (2011) (invited paper)
Article Google Scholar
Evans, J.J., Hood, C.S., Gropp, W.D.: Exploring the relationship between parallel application run-time variability and network performance in clusters. In: Workshop on High Speed Local Networks (HSLN) from the Proceedings of the 28th IEEE Conference on Local Computer Networks (LCN), pp. 538–547 (2003)
Google Scholar
Evans, J.J., Lucas, C.E.: Evaluation of parallel application-level behavioral attributes. In: 25th International Conference on Supercomputing, First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES) (2011)
Google Scholar
Evans, J.J., Lucas, C.E.: PARSE 2.0: a tool for parallel application run time behavior evaluation. In: The 31st International Conference on Distributed Computing Systems, 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES 2011) (2011)
Google Scholar
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Reading, Addison-Wesley (1995)
MATH Google Scholar
Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. Parallel Distrib. Syst. 16(11), 1066–1077 (2005)
Article Google Scholar
Ge, R., Feng, X., Cameron, K.W.: Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05), p. 34. IEEE Computer Society, Washington (2005)
Chapter Google Scholar
Glass, C.J., Ni, L.M.: The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 278–287 (1992)
Chapter Google Scholar
Gu, W., Eisenhauer, G., Schwan, K.: Falcon: On-line monitoring and steering of parallel programs. In: Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97) (1997)
Google Scholar
Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: 7th Workshop on Job Scheduling Strategies for Parallel Processing (SIGMETRICS 2001). ACM, New York (2001)
Google Scholar
Chambers, J. et al.: The R language. http://www.r-project.org/ (2010)
Jurczyk, M.: Traffic control in wormhole-routing multistage interconnection networks. In: Proceedings of the International Conference on Parallel and Distributed Computing and Systems, vol. 1, pp. 157–162 (2000)
Google Scholar
Keleher, P.J., Hollingsworth, J.K., Perkovic, D.: Exposing application alternatives. In: International Conference on Distributed Computing Systems, pp. 384–392 (1999)
Google Scholar
Khonsari, A., Sarbazi-Azad, H., Ould-Khaoua, M.: Analysis of timeout-based adaptive wormhole routing. In: Proceedings of the Ninth International Symposium on Modeling. Analysis and Simulation of Computer and Telecommunication Systems, vol. 1, pp. 275–282 (2001)
Google Scholar
Liao, C., Martonosi, M., Clark, D.W.: Performance monitoring in a Myrinet-connected shrimp cluster. In: Proceedings of the 2nd SIGMETRICS Symposium on Parallel and Distributed Tools (1998)
Google Scholar
Lopez, P., Martinez, J., Duato, J.: A very efficient distributed deadlock detection mechanism for wormhole networks. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pp. 57–66 (1998)
Google Scholar
Lyon, G., Snelick, R., Kacker, R.: Synthetic-perturbation tuning of MIMD programs. J. Supercomput. 8(1), 5–28 (1994)
Article MATH Google Scholar
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollinsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The paradyn parallel performance measurement tools. In: IEEE Computer, vol. 28, pp. 37–46 (1995)
Google Scholar
Mukherjee, T., Banerjee, A., Varsamopoulos, G., Gupta, S.K.S., Rungta, S.: Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput. Netw. 53(17), 2888–2904 (2009)
Article MATH Google Scholar
Ogle, D.M., Schwan, K., Snodgrass, R.: Application-dependent dynamic monitoring of distributed and parallel systems. IEEE Trans. Parallel Distrib. Syst. 4(7), 762–778 (1993)
Article Google Scholar
Orduna, J.M., Silla, F., Duato, J.: A new task mapping technique for communication-aware scheduling strategies. In: International Conference on Parallel Processing Workshops, pp. 349–354 (2001)
Chapter Google Scholar
Ribler, R.L., Vetter, J.S., Simitci, H., Reed, D.A.: Autopilot: adaptive control of distributed applications. In: Proceedings of the Seventh International Symposium on Distributed Computing, pp. 172–179 (1998)
Google Scholar
Scaramella, J.: Idc worldwide server power and cooling expense 2006–2010. On-line document (2006)
Sheehan, T., Maloney, A., Shende, S.: A runtime monitoring framework for the tau profiling system. In: Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE’99) (1999)
Google Scholar
Sinnen, O., Sousa, L.A., Sandnes, F.E.: Toward a realistic task scheduling model. IEEE Trans. Parallel Distrib. Syst. 17(3), 263–275 (2006)
Article Google Scholar
Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 39–46 (2002)
Chapter Google Scholar
Subramani, V., Kettimuthu, R., Srinivasan, S., Johnston, J.: Selective buddy allocation for scheduling parallel jobs on clusters. In: Proceedings of the International Conference on Cluster Computing, pp. 107–116 (2002)
Chapter Google Scholar
Tamches, A., Miller, B.P.: Using dynamic kernel instrumentation for kernal and application tuning. Int. J. High Perform. Comput. Appl. 13(3), 263–276 (1999)
Article Google Scholar
Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Thermal-aware task scheduling for data centers through minimizing heat recirculation. In: in IEEE Cluster (2007)
Google Scholar
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: Towards automated performance tuning. In: Proceedings from the Conference on High Performance Networking and Computing (2002)
Google Scholar
Veeraraghavan, P., Evans, J.J.: Parallel application communication performance on multi-core high performance computing systems. In: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems (PDCS 2010), pp. 9–16 (2010)
Google Scholar
Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 27–36 (2002)
Google Scholar
Vetter, J.S., Reed, D.A.: Real-time performance monitoring, adaptive control, and interactive steering of computational grids. Int. J. High Perform. Comput. Appl. 14(4), 357–366 (2000)
Article Google Scholar
Vetter, J.S., Worley, P.: Asserting performance expectations. In: Supercomputing, ACM/IEEE 2002 Conference, pp. 33–46 (2002)
Google Scholar
der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Tech. rep. NAS-02-007, NASA Ames Research Center (2002)
Williams, T., Kelley, C.: Gnuplot. http://www.gnuplot.info/ (2010)
Worley, P., Loftis, B.: Scrubbed run log from ornl jaguar xt5 partition, 2010. Direct Correspondence (2011)
Worley, P.H.: Parallel spectral transform shallow water model. Onine document (2003). http://www.csm.ornl.gov/chammp/pstswm/
Worley, P.H., Robinson, A.C., Mackay, D.R., Barragy, E.J.: A study of application sensitivity to variation in message passing latency and bandwidth. In: Concurrency: Practice and Experience, vol. 10, pp. 387–406. Wiley, New York (1998)
Google Scholar
Worley, P.H., Toonan, B.: A Users’ Guide to PSTSWM. Oak Ridge National Laboratory (1995). ORNL/TM-12779

Download references

Acknowledgements

This material is based upon work supported by the Department of Energy under award number DE-SC0004596.

Author information

Authors and Affiliations

Purdue University, 401 N. Grant Street, West Lafayette, IN, 47907, USA
Jeffrey J. Evans
PC Krause and Associates, Inc., 3000 Kent Avenue, Suite C1-100, West Lafayette, IN, 47906, USA
Charles E. Lucas

Authors

Jeffrey J. Evans
View author publications
You can also search for this author in PubMed Google Scholar
Charles E. Lucas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey J. Evans.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Evans, J.J., Lucas, C.E. Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems. Cluster Comput 16, 91–115 (2013). https://doi.org/10.1007/s10586-011-0193-4

Download citation

Received: 19 May 2011
Accepted: 17 November 2011
Published: 17 December 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10586-011-0193-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Abstract

Access this article

Similar content being viewed by others

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications

A Three Step Blind Approach for Improving HPC Systems’ Energy Performance

Time and Energy Performance of Parallel Systems with Hierarchical Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Abstract

Access this article

Similar content being viewed by others

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications

A Three Step Blind Approach for Improving HPC Systems’ Energy Performance

Time and Energy Performance of Parallel Systems with Hierarchical Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation