Skip to main content

Advertisement

Log in

Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.

We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.

To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Argonne National Laboratory: Using the hydra process manager. Web URL (2011). http://wiki.mcs.anl.gov/mpich2/

  2. Baik, S., Hood, C., Gropp, W.: Prototype of AM3: active mapper and monitoring module for the myrinet environment. In: Proceedings of the HSLN Workshop (2002)

    Google Scholar 

  3. Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineburg, S., Fredrickson, P., Lasinksi, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center (1994)

  4. Baydal, E., Lopez, P., Duato, J.: A congestion control mechanism for wormhole networks. In: Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, pp. 19–26 (2001)

    Chapter  Google Scholar 

  5. Bode, B., Halstead, D., Kendall, R., Lei, Z.: The portable batch scheduler and the Maui scheduler on Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)

    Google Scholar 

  6. Bollinger, J., Gross, T.: A framework-based approach to the development of network-aware applications. IEEE Trans. Softw. Eng. 24(5), 376–390 (1998)

    Article  Google Scholar 

  7. Chakravarthi, S., Pillai, A., Padmanabhan, J., Apte, M., Skjellum, A.: A fine-grain synchronization mechanism for QoS based communication on Myrinet. In: International Conference on Distributed Computing, 2001 (2001, submitted)

  8. Coll, S., Flich, J., Malumbres, M., Lopez, P., Duato, J., Mora, F.: A first implementation of in-transit buffers on Myrinet gm software. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium, pp. 1640–1647 (2001)

    Google Scholar 

  9. Dally, W.J., Seitz, C.L.: Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. C-36, 547–553 (1987)

    Article  Google Scholar 

  10. Du, X., Dong, Y., Zhang, X.: Characterizing communication interactions of parallel and sequential jobs on networks of workstations. In: Proceedings of the IEEE Annual International Conference on Communications, pp. 1133–1137 (1997)

    Google Scholar 

  11. D.A.R. (Editor): The roadmap for the revitalization of high-end computing. Tech. rep., Computing Research Association (2003). http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach/20040112_cra_hecrtf_report.pdf

  12. Evans, J.J.: Modeling parallel application sensitivity to network performance. Ph.D. thesis, Illinois Institute of Technology (2005)

  13. Evans, J.J., Baik, S., Kroculick, J., Hood, C.S.: Network adaptability in clusters and grids. In: Proceedings from the Conference on Advances in Internet Technologies and Applications (CAITA), CDROM. IPSI (2004)

  14. Evans, J.J., Hood, C.S.: Network performance variability in NOW clusters. In: Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid (CCGrid05) (CDROM) (2005)

    Google Scholar 

  15. Evans, J.J., Hood, C.S.: PARSE: a tool for parallel application run time sensitivity evaluation. In: Proceedings of the Twelfth International Conference on Parallel and Distributed Systems (ICPADS), pp. 475–484 (2006)

    Google Scholar 

  16. Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. In: Proceedings of the Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) (Best Paper), pp. 920–932 (2007)

    Chapter  Google Scholar 

  17. Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. Int. J. High. Perform. Comput. Networking 7(1), 8–18 (2011) (invited paper)

    Article  Google Scholar 

  18. Evans, J.J., Hood, C.S., Gropp, W.D.: Exploring the relationship between parallel application run-time variability and network performance in clusters. In: Workshop on High Speed Local Networks (HSLN) from the Proceedings of the 28th IEEE Conference on Local Computer Networks (LCN), pp. 538–547 (2003)

    Google Scholar 

  19. Evans, J.J., Lucas, C.E.: Evaluation of parallel application-level behavioral attributes. In: 25th International Conference on Supercomputing, First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES) (2011)

    Google Scholar 

  20. Evans, J.J., Lucas, C.E.: PARSE 2.0: a tool for parallel application run time behavior evaluation. In: The 31st International Conference on Distributed Computing Systems, 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES 2011) (2011)

    Google Scholar 

  21. Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Reading, Addison-Wesley (1995)

    MATH  Google Scholar 

  22. Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. Parallel Distrib. Syst. 16(11), 1066–1077 (2005)

    Article  Google Scholar 

  23. Ge, R., Feng, X., Cameron, K.W.: Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05), p. 34. IEEE Computer Society, Washington (2005)

    Chapter  Google Scholar 

  24. Glass, C.J., Ni, L.M.: The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 278–287 (1992)

    Chapter  Google Scholar 

  25. Gu, W., Eisenhauer, G., Schwan, K.: Falcon: On-line monitoring and steering of parallel programs. In: Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97) (1997)

    Google Scholar 

  26. Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: 7th Workshop on Job Scheduling Strategies for Parallel Processing (SIGMETRICS 2001). ACM, New York (2001)

    Google Scholar 

  27. Chambers, J. et al.: The R language. http://www.r-project.org/ (2010)

  28. Jurczyk, M.: Traffic control in wormhole-routing multistage interconnection networks. In: Proceedings of the International Conference on Parallel and Distributed Computing and Systems, vol. 1, pp. 157–162 (2000)

    Google Scholar 

  29. Keleher, P.J., Hollingsworth, J.K., Perkovic, D.: Exposing application alternatives. In: International Conference on Distributed Computing Systems, pp. 384–392 (1999)

    Google Scholar 

  30. Khonsari, A., Sarbazi-Azad, H., Ould-Khaoua, M.: Analysis of timeout-based adaptive wormhole routing. In: Proceedings of the Ninth International Symposium on Modeling. Analysis and Simulation of Computer and Telecommunication Systems, vol. 1, pp. 275–282 (2001)

    Google Scholar 

  31. Liao, C., Martonosi, M., Clark, D.W.: Performance monitoring in a Myrinet-connected shrimp cluster. In: Proceedings of the 2nd SIGMETRICS Symposium on Parallel and Distributed Tools (1998)

    Google Scholar 

  32. Lopez, P., Martinez, J., Duato, J.: A very efficient distributed deadlock detection mechanism for wormhole networks. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pp. 57–66 (1998)

    Google Scholar 

  33. Lyon, G., Snelick, R., Kacker, R.: Synthetic-perturbation tuning of MIMD programs. J. Supercomput. 8(1), 5–28 (1994)

    Article  MATH  Google Scholar 

  34. Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollinsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The paradyn parallel performance measurement tools. In: IEEE Computer, vol. 28, pp. 37–46 (1995)

    Google Scholar 

  35. Mukherjee, T., Banerjee, A., Varsamopoulos, G., Gupta, S.K.S., Rungta, S.: Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput. Netw. 53(17), 2888–2904 (2009)

    Article  MATH  Google Scholar 

  36. Ogle, D.M., Schwan, K., Snodgrass, R.: Application-dependent dynamic monitoring of distributed and parallel systems. IEEE Trans. Parallel Distrib. Syst. 4(7), 762–778 (1993)

    Article  Google Scholar 

  37. Orduna, J.M., Silla, F., Duato, J.: A new task mapping technique for communication-aware scheduling strategies. In: International Conference on Parallel Processing Workshops, pp. 349–354 (2001)

    Chapter  Google Scholar 

  38. Ribler, R.L., Vetter, J.S., Simitci, H., Reed, D.A.: Autopilot: adaptive control of distributed applications. In: Proceedings of the Seventh International Symposium on Distributed Computing, pp. 172–179 (1998)

    Google Scholar 

  39. Scaramella, J.: Idc worldwide server power and cooling expense 2006–2010. On-line document (2006)

  40. Sheehan, T., Maloney, A., Shende, S.: A runtime monitoring framework for the tau profiling system. In: Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE’99) (1999)

    Google Scholar 

  41. Sinnen, O., Sousa, L.A., Sandnes, F.E.: Toward a realistic task scheduling model. IEEE Trans. Parallel Distrib. Syst. 17(3), 263–275 (2006)

    Article  Google Scholar 

  42. Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 39–46 (2002)

    Chapter  Google Scholar 

  43. Subramani, V., Kettimuthu, R., Srinivasan, S., Johnston, J.: Selective buddy allocation for scheduling parallel jobs on clusters. In: Proceedings of the International Conference on Cluster Computing, pp. 107–116 (2002)

    Chapter  Google Scholar 

  44. Tamches, A., Miller, B.P.: Using dynamic kernel instrumentation for kernal and application tuning. Int. J. High Perform. Comput. Appl. 13(3), 263–276 (1999)

    Article  Google Scholar 

  45. Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Thermal-aware task scheduling for data centers through minimizing heat recirculation. In: in IEEE Cluster (2007)

    Google Scholar 

  46. Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: Towards automated performance tuning. In: Proceedings from the Conference on High Performance Networking and Computing (2002)

    Google Scholar 

  47. Veeraraghavan, P., Evans, J.J.: Parallel application communication performance on multi-core high performance computing systems. In: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems (PDCS 2010), pp. 9–16 (2010)

    Google Scholar 

  48. Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 27–36 (2002)

    Google Scholar 

  49. Vetter, J.S., Reed, D.A.: Real-time performance monitoring, adaptive control, and interactive steering of computational grids. Int. J. High Perform. Comput. Appl. 14(4), 357–366 (2000)

    Article  Google Scholar 

  50. Vetter, J.S., Worley, P.: Asserting performance expectations. In: Supercomputing, ACM/IEEE 2002 Conference, pp. 33–46 (2002)

    Google Scholar 

  51. der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Tech. rep. NAS-02-007, NASA Ames Research Center (2002)

  52. Williams, T., Kelley, C.: Gnuplot. http://www.gnuplot.info/ (2010)

  53. Worley, P., Loftis, B.: Scrubbed run log from ornl jaguar xt5 partition, 2010. Direct Correspondence (2011)

  54. Worley, P.H.: Parallel spectral transform shallow water model. Onine document (2003). http://www.csm.ornl.gov/chammp/pstswm/

  55. Worley, P.H., Robinson, A.C., Mackay, D.R., Barragy, E.J.: A study of application sensitivity to variation in message passing latency and bandwidth. In: Concurrency: Practice and Experience, vol. 10, pp. 387–406. Wiley, New York (1998)

    Google Scholar 

  56. Worley, P.H., Toonan, B.: A Users’ Guide to PSTSWM. Oak Ridge National Laboratory (1995). ORNL/TM-12779

Download references

Acknowledgements

This material is based upon work supported by the Department of Energy under award number DE-SC0004596.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey J. Evans.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Evans, J.J., Lucas, C.E. Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems. Cluster Comput 16, 91–115 (2013). https://doi.org/10.1007/s10586-011-0193-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-011-0193-4

Keywords

Navigation