Abstract
This article describes how the integration of the OpenUH OpenMP compiler with the KOJAK performance analysis tool can assist developers of OpenMP and hybrid codes in optimizing their applications with as little user intervention as possible. In particular, we (i) describe how the compiler’s ability to automatically instrument user code down to the flow-graph level can improve the location of performance problems and (ii) outline how the performance feedback provided by KOJAK will direct the compiler’s optimization decisions in the future. To demonstrate our methodology, we present experimental results showing how reasons for the performance slow down of the ASPCG benchmark could be identified.
This material is based upon work supported by the National Science Foundation under grant No. 0444363 and 0444468.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adl-Tabatabai, A.-R.: The StarJIT Compiler: A Dynamic Compiler for Managed Runtime Environments. Intel Technology Journal 7, 19–31 (2003)
Ayguadé, E., Blainey, B., Alejandro.: Is the Schedule Clause Really Necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–160. Springer, Heidelberg (2003)
Burcea, M., Voss, M.J.: A Runtime Optimization System for OpenMP. In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 42–53. Springer, Heidelberg (2003)
Chen, W., Bringmann, R., Mahlke, S., et al.: Using Profile Information to Assist Advanced Compiler Optimization and Scheduling. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, Springer, Heidelberg (1993)
Dang, F.H., Rauchwerger, L.: Speculative Parallelization of Partially Parallel Loops. In: Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 285–299 (2000)
Hancock, D.J., Mark Bull, J., et al.: An Investigation of Feedback Guided Dynamic Scheduling of Nested Loops. In: ICPP Workshop (2000)
Nagel, W., Hoeflinger, J., Kuhn, B.: An Integrated Performance Visualzer for MPI/OpenMP Programs. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, Springer, Heidelberg (2001)
Jorba, J., Margalef, T., Luque, E.: Automatic Performance Analysis of Message Passing Applications Using the KappaPI 2 Tool. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds.) EuroPVM/MPI 2005. LNCS, vol. 3666, pp. 293–300. Springer, Heidelberg (2005)
Kufrin, R.: Perfsuite: An Accessible, Open Source Performance Analysis Environment for Linux. In: Proc. of the Linux Cluster Conference, Chapel Hill, North Carolina (April 2005)
Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: An Optimizing, Portable OpenMP Compiler. In: 12th Workshop on Compilers for Parallel Computers (January 2006)
Malony, A.D., Shende, S.: Performance Technology for Complex Parallel and Distributed Systems. In: Kacsuk, P., Kotsis, G. (eds.) Quality of Parallel and Distributed Programs and Systems, pp. 25–41. Nova Science Publishers, Inc., New York (2003)
Miller, B., Callaghan, M., Cargille, J., et al.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28(11), 37–46 (1995)
Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and Prototype of a Performance Tool Interface for OpenMP. The Journal of Supercomputing 23, 105–128 (2002)
Nagel, W., Weber, M., Hoppe, H.-C., Solchenbach, K.: VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer 63, XII(1), 69–80 (1996)
PERISCOPE, http://wwwbode.cs.tum.edu/~gerndt/home/research/periscope/periscope.htm
Seragiotto Júnior, C., Geissler, M., Madsen, G., Moritsch, H.: On Using Aksum for Semi-Automatically Searching of Performance Problems in Parallel and Distributed Programs. In: Proc. of PDP 2003, Genua, Italy (February 2003)
Song, F., Wolf, F., Bhatia, N., Dongarra, J., Moore, S.: An Algebra for Cross-Experiment Performance Analysis. In: Proc. of the International Conference on Parallel Processing (ICPP), Montreal, Canada (August 2004)
Wang, G., Tafti, D.K.: Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems. Int. J. of Supercomputing Applications and High Performance Computing 13(1), 63–79 (1999)
Wolf, F., Mohr, B.: Automatic Performance Analysis of Hybrid MPI/OpenMP Applications. Journal of Systems Architecture 49(10-11), 421–439 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hernandez, O. et al. (2008). Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds) OpenMP Shared Memory Parallel Programming. IWOMP 2005. Lecture Notes in Computer Science, vol 4315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68555-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-68555-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68554-8
Online ISBN: 978-3-540-68555-5
eBook Packages: Computer ScienceComputer Science (R0)