Abstract
Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.
We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.
To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.
Similar content being viewed by others
References
Argonne National Laboratory: Using the hydra process manager. Web URL (2011). http://wiki.mcs.anl.gov/mpich2/
Baik, S., Hood, C., Gropp, W.: Prototype of AM3: active mapper and monitoring module for the myrinet environment. In: Proceedings of the HSLN Workshop (2002)
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineburg, S., Fredrickson, P., Lasinksi, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center (1994)
Baydal, E., Lopez, P., Duato, J.: A congestion control mechanism for wormhole networks. In: Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, pp. 19–26 (2001)
Bode, B., Halstead, D., Kendall, R., Lei, Z.: The portable batch scheduler and the Maui scheduler on Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)
Bollinger, J., Gross, T.: A framework-based approach to the development of network-aware applications. IEEE Trans. Softw. Eng. 24(5), 376–390 (1998)
Chakravarthi, S., Pillai, A., Padmanabhan, J., Apte, M., Skjellum, A.: A fine-grain synchronization mechanism for QoS based communication on Myrinet. In: International Conference on Distributed Computing, 2001 (2001, submitted)
Coll, S., Flich, J., Malumbres, M., Lopez, P., Duato, J., Mora, F.: A first implementation of in-transit buffers on Myrinet gm software. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium, pp. 1640–1647 (2001)
Dally, W.J., Seitz, C.L.: Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. C-36, 547–553 (1987)
Du, X., Dong, Y., Zhang, X.: Characterizing communication interactions of parallel and sequential jobs on networks of workstations. In: Proceedings of the IEEE Annual International Conference on Communications, pp. 1133–1137 (1997)
D.A.R. (Editor): The roadmap for the revitalization of high-end computing. Tech. rep., Computing Research Association (2003). http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach/20040112_cra_hecrtf_report.pdf
Evans, J.J.: Modeling parallel application sensitivity to network performance. Ph.D. thesis, Illinois Institute of Technology (2005)
Evans, J.J., Baik, S., Kroculick, J., Hood, C.S.: Network adaptability in clusters and grids. In: Proceedings from the Conference on Advances in Internet Technologies and Applications (CAITA), CDROM. IPSI (2004)
Evans, J.J., Hood, C.S.: Network performance variability in NOW clusters. In: Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid (CCGrid05) (CDROM) (2005)
Evans, J.J., Hood, C.S.: PARSE: a tool for parallel application run time sensitivity evaluation. In: Proceedings of the Twelfth International Conference on Parallel and Distributed Systems (ICPADS), pp. 475–484 (2006)
Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. In: Proceedings of the Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) (Best Paper), pp. 920–932 (2007)
Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. Int. J. High. Perform. Comput. Networking 7(1), 8–18 (2011) (invited paper)
Evans, J.J., Hood, C.S., Gropp, W.D.: Exploring the relationship between parallel application run-time variability and network performance in clusters. In: Workshop on High Speed Local Networks (HSLN) from the Proceedings of the 28th IEEE Conference on Local Computer Networks (LCN), pp. 538–547 (2003)
Evans, J.J., Lucas, C.E.: Evaluation of parallel application-level behavioral attributes. In: 25th International Conference on Supercomputing, First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES) (2011)
Evans, J.J., Lucas, C.E.: PARSE 2.0: a tool for parallel application run time behavior evaluation. In: The 31st International Conference on Distributed Computing Systems, 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES 2011) (2011)
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Reading, Addison-Wesley (1995)
Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. Parallel Distrib. Syst. 16(11), 1066–1077 (2005)
Ge, R., Feng, X., Cameron, K.W.: Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05), p. 34. IEEE Computer Society, Washington (2005)
Glass, C.J., Ni, L.M.: The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 278–287 (1992)
Gu, W., Eisenhauer, G., Schwan, K.: Falcon: On-line monitoring and steering of parallel programs. In: Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97) (1997)
Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: 7th Workshop on Job Scheduling Strategies for Parallel Processing (SIGMETRICS 2001). ACM, New York (2001)
Chambers, J. et al.: The R language. http://www.r-project.org/ (2010)
Jurczyk, M.: Traffic control in wormhole-routing multistage interconnection networks. In: Proceedings of the International Conference on Parallel and Distributed Computing and Systems, vol. 1, pp. 157–162 (2000)
Keleher, P.J., Hollingsworth, J.K., Perkovic, D.: Exposing application alternatives. In: International Conference on Distributed Computing Systems, pp. 384–392 (1999)
Khonsari, A., Sarbazi-Azad, H., Ould-Khaoua, M.: Analysis of timeout-based adaptive wormhole routing. In: Proceedings of the Ninth International Symposium on Modeling. Analysis and Simulation of Computer and Telecommunication Systems, vol. 1, pp. 275–282 (2001)
Liao, C., Martonosi, M., Clark, D.W.: Performance monitoring in a Myrinet-connected shrimp cluster. In: Proceedings of the 2nd SIGMETRICS Symposium on Parallel and Distributed Tools (1998)
Lopez, P., Martinez, J., Duato, J.: A very efficient distributed deadlock detection mechanism for wormhole networks. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pp. 57–66 (1998)
Lyon, G., Snelick, R., Kacker, R.: Synthetic-perturbation tuning of MIMD programs. J. Supercomput. 8(1), 5–28 (1994)
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollinsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The paradyn parallel performance measurement tools. In: IEEE Computer, vol. 28, pp. 37–46 (1995)
Mukherjee, T., Banerjee, A., Varsamopoulos, G., Gupta, S.K.S., Rungta, S.: Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput. Netw. 53(17), 2888–2904 (2009)
Ogle, D.M., Schwan, K., Snodgrass, R.: Application-dependent dynamic monitoring of distributed and parallel systems. IEEE Trans. Parallel Distrib. Syst. 4(7), 762–778 (1993)
Orduna, J.M., Silla, F., Duato, J.: A new task mapping technique for communication-aware scheduling strategies. In: International Conference on Parallel Processing Workshops, pp. 349–354 (2001)
Ribler, R.L., Vetter, J.S., Simitci, H., Reed, D.A.: Autopilot: adaptive control of distributed applications. In: Proceedings of the Seventh International Symposium on Distributed Computing, pp. 172–179 (1998)
Scaramella, J.: Idc worldwide server power and cooling expense 2006–2010. On-line document (2006)
Sheehan, T., Maloney, A., Shende, S.: A runtime monitoring framework for the tau profiling system. In: Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE’99) (1999)
Sinnen, O., Sousa, L.A., Sandnes, F.E.: Toward a realistic task scheduling model. IEEE Trans. Parallel Distrib. Syst. 17(3), 263–275 (2006)
Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 39–46 (2002)
Subramani, V., Kettimuthu, R., Srinivasan, S., Johnston, J.: Selective buddy allocation for scheduling parallel jobs on clusters. In: Proceedings of the International Conference on Cluster Computing, pp. 107–116 (2002)
Tamches, A., Miller, B.P.: Using dynamic kernel instrumentation for kernal and application tuning. Int. J. High Perform. Comput. Appl. 13(3), 263–276 (1999)
Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Thermal-aware task scheduling for data centers through minimizing heat recirculation. In: in IEEE Cluster (2007)
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: Towards automated performance tuning. In: Proceedings from the Conference on High Performance Networking and Computing (2002)
Veeraraghavan, P., Evans, J.J.: Parallel application communication performance on multi-core high performance computing systems. In: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems (PDCS 2010), pp. 9–16 (2010)
Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 27–36 (2002)
Vetter, J.S., Reed, D.A.: Real-time performance monitoring, adaptive control, and interactive steering of computational grids. Int. J. High Perform. Comput. Appl. 14(4), 357–366 (2000)
Vetter, J.S., Worley, P.: Asserting performance expectations. In: Supercomputing, ACM/IEEE 2002 Conference, pp. 33–46 (2002)
der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Tech. rep. NAS-02-007, NASA Ames Research Center (2002)
Williams, T., Kelley, C.: Gnuplot. http://www.gnuplot.info/ (2010)
Worley, P., Loftis, B.: Scrubbed run log from ornl jaguar xt5 partition, 2010. Direct Correspondence (2011)
Worley, P.H.: Parallel spectral transform shallow water model. Onine document (2003). http://www.csm.ornl.gov/chammp/pstswm/
Worley, P.H., Robinson, A.C., Mackay, D.R., Barragy, E.J.: A study of application sensitivity to variation in message passing latency and bandwidth. In: Concurrency: Practice and Experience, vol. 10, pp. 387–406. Wiley, New York (1998)
Worley, P.H., Toonan, B.: A Users’ Guide to PSTSWM. Oak Ridge National Laboratory (1995). ORNL/TM-12779
Acknowledgements
This material is based upon work supported by the Department of Energy under award number DE-SC0004596.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Evans, J.J., Lucas, C.E. Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems. Cluster Comput 16, 91–115 (2013). https://doi.org/10.1007/s10586-011-0193-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-011-0193-4