Abstract
High productivity to the end user is critical in harnessing the power of high performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper we propose a holistic approach towards automated performance analysis and tuning that we expect to greatly improve the productivity of performance debugging. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant.
This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proc. 13th international conference on parallel architecture and compilation techniques, Antibes Juan-les-Pins, France, September 2004, pp. 7–16 (2004)
Bhatele, A., Cong, G.: A selective profiling tool: towards automatic performance tuning. In: Proc. 3rd Workshop on System Management Techniques, Processes and Services (SMTPS 2007), Long beach, California (March 2007)
Chen, W., Bringmann, R., Mahlke, S., et al.: Using profile information to assist advanced compiler optimization and scheduling. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 31–48. Springer, Heidelberg (1993)
Cong, G., Seelam, S., et al.: Towards next-generation performance optimization tools: A case study. In: Proc. 1st Workshop on Tools Infrastructures and Methodologies for the Evaluation of Research Systems, Austin, TX (March 2007)
DeRose, L., Ekanadham, K., Hollingsworth, J.K., Sbaraglia, S.: Sigma: a simulator infrastructure to guide memory analysis. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp. 1–13 (2002)
Geimer, M., Wolf, F., Wylie, B.J.N., Abraham, E., Becker, D., Mohr, B.: The SCALASCA performance toolset architecture. In: Proc. Int’l Workshop on Scalable Tools for High-End Computing (STHEC), Kos, Greece (2008)
High productivity computer systems (2005), http://highproductivity.org
High productivity computing systems toolkit. IBM alphaworks, http://www.alphaworks.ibm.com/tech/hpcst
MacNab, A., Vahala, G., Pavlo, P., Vahala, L., Soe, M.: Lattice Boltzmann Model for Dissipative Incompressible MHD. In: 28th EPS Conference on Contr. Fusion and Plasma Phys., vol. 25A, pp. 853–856 (2001)
Malony, A.D., Shende, S., Bell, R., Li, K., Li, L., Trebon, N.: Advances in the tau performance system, pp. 129–144 (2004)
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28, 37–46 (1995)
Mohr, B., Wolf, F.: KOJAK – A tool set for automatic performance analysis of parallel programs. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1301–1304. Springer, Heidelberg (2003)
Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualise and analyze parallel code. In: Proc of WoTUG-18: Transputer and occam Developments, vol. 44, pp. 17–31. IOS Press, Amsterdam (1995)
Vuduc, R., Demmel, J., Yelick, K.: OSKI: A library of automatically tuned sparse matrix kernels. In: Proceedings of SciDAC 2005, Journal of Physics: Conference Series (2005)
Wen, H., Sbaraglia, S., Seelam, S., Chung, I., Cong, G., Klepacki, D.: A productivity centered tools framework for application performance tuning. In: QEST 2007: Proc. of the Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007), Washington, DC, USA, 2007, pp. 273–274. IEEE Computer Society, Los Alamitos (2007)
Whaley, R., Dongarra, J.: Automatically tuned linear algebra software (ATLAS). In: Proc. Supercomputing 1998, Orlando, FL (November 1998), www.netlib.org/utk/people/JackDongarra/PAPERS/atlas-sc98.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cong, G. et al. (2009). A Holistic Approach towards Automated Performance Analysis and Tuning. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-03869-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)