PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

Mustafa, Dheya; Eigenmann, Rudolf

doi:10.1007/s10766-014-0307-8

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

Published: 19 March 2014

Volume 43, pages 549–571, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Dheya Mustafa¹ &
Rudolf Eigenmann¹

414 Accesses
6 Citations
Explore all metrics

Abstract

This paper describes PETRA: a portable performance evaluation tool for parallelizing compilers and their individual techniques. Automatic parallelization of sequential programs combined with performance tuning is an important alternative to manual parallelization for exploiting the performance potential of today’s multicores. Given the renewed interest in autoparallelization, this paper aims at a comprehensive evaluation, identifying strengths and weaknesses in the underlying techniques. The findings allow engineers to make informed decisions about techniques to include in industrial products and direct researchers to potential improvements. We present an experimental methodology and a fully automated implementation for comprehensively evaluating the effectiveness of parallelizing compilers and their underlying optimization techniques. The methodology is the first to (1) include automatic tuning, (2) measure the performance contributions of individual techniques at multiple optimization levels, and (3) quantify the interactions of compiler optimizations. The results will also help close the gap between research compilers and industrial compilers, which are still far behind. We applied the proposed methodology using PETRA on five modern parallelizing compilers and their tuning capabilities, illustrating several use cases and applications for the evaluation tool. We report speedups, parallel coverage, and the number of parallel loops, using the NAS Benchmarks as a program suite. We found parallelizers to be reasonably successful in about half of the given science-engineering programs. An important finding is also that some techniques substitute each other. Furthermore, we found that automatic tuning can lead to significant additional performance and sometimes matches or outperforms hand-parallelized programs. Advanced versions of some of the techniques identified as most successful in previous generations of compilers are also most important today, while other techniques have risen significantly in impact. Finally, we analyze specific reasons for the measured performance and the potential for improvement of automatic parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

References

Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelization. In: Proceedings of the IEEE, pp. 211–243 (1993)
Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)
Google Scholar
Callahan, D., Levine, D.: Vectorizing compilers: a test suite and results. In: SC Conference, pp. 98–105 (1988)
Cavazos, J., O’Boyle, M.F.P.: Automatic tuning of inlining heuristics. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 14–14 (2005). doi:10.1109/SC.2005.14
Cytron, R., Kuck, D.J., Veidenbaum, A.V.: The effect of restructing compilers on program performance for high-speed computers. Comput. Phys. Commun. 37, 39–48 (1985)
Google Scholar
der Wijngaart, R.F.V.: NAS Parallel Benchmarks Version 2.4. Technical Report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)
Google Scholar
Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: LCPC ’09: Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (2009)
Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: ICPP (2)’91, pp. 17–25 (1991)
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9, 5–23 (1998)
Google Scholar
Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pp. 123–132 (2005). doi:10.1109/PACT.2005.9
Intel C++ Compiler for Linux Systems User’s Guide. http://denali.princeton.edu/intel_cc_docs/c_ug/index.htm
Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)
Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? SIGPLAN Not. 44(4), 3–14 (2009). doi:10.1145/1594835.1504181
Article Google Scholar
Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP ’12, pp. 410–419. IEEE Computer Society, Washington, DC, USA (2012). doi:10.1109/ICPP.2012.48
Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. Pract. Exp. 19(18), 2317–2332 (2007)
Article Google Scholar
Maleki, S., Gao, Y., Garzaran, M., Wong, T., Padua, D.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp. 372–382 (2011). doi:10.1109/PACT.2011.68
McKinley, K.S., Singhai, S.K., Weaver, G.E., Weems, C.C.: Compiling for Heterogeneous System: A Survey and an Approach. Technical Report. Amherst, MA, USA (1995)
Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Artificial Intelligence: Methodology, Systems, Applications, pp. 41–50. Springer, Berlin (2002)
Mustafa, D., Auranzeb, Eigenmann, R.: Performance analysis and tuning of automatically parallelized openmp applications. In: Proceedings of the International Workshop on OpenMP, IWOMP (2011)
Mustafa, D., Eigenmann, R.: Portable section-level tuning of compiler parallelized applications. In: SC’12: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing. IEEE Press (2012). http://engineering.purdue.edu/paramnt/publications/SC12_DHEYA.pdf
Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing fortran compilers. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, pp. 820–825 (1989)
OpenMP [Online]. Available: http://openmp.org/wp/
OpenUH Compiler Suite : User Guide (2007). http://www2.cs.uh.edu/openuh/OpenUHUserGuide.pdf
O’Boyle, M.F.P., Bull, J.M.: Expert programmer versus parallelizing compiler: a comparative study of two approaches for distributed shared memory. Sci. Program. 5(1), 63–88 (1996)
Google Scholar
PGI Compiler Reference Manual: Parallel Fortran, C and C++ for Scientists and Engineers (2012). http://www.pgroup.com/doc//pgiref.pdf
PGI Compiler user’s Guide (2012). http://www.pgroup.com/doc/pgiug.pdf
Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: First International Workshop on OpenMP, pp. 24–35 (2005)
Pan, Z., Eigenmann, R.: Rating compiler optimizations for automatic performance tuning. In: SC2004: High Performance Computing, Networking and Storage Conference, (10 pages) (2004). http://engineering.purdue.edu/paramnt/publications/sc04.pdf
Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: The 4th Annual International Symposium on Code Generation and Optimization (CGO), pp. 319–330 (2006). http://engineering.purdue.edu/paramnt/publications/PE06_Orchestration.pdf
Pan, Z., Eigenmann, R.: Peak—a fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30, 1–43 (2008)
Google Scholar
Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pp. 12–25. ACM, New York, NY, USA (2011). doi:10.1145/1993498.1993501
Pinkers, R.P.J., Knijnenburg, P.M.W., Haneda, M., Wijshoff, H.A.G.: Statistical selection of compiler options. In: Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings. The IEEE Computer Society’s 12th Annual International Symposium on, pp. 494–501 (2004). doi:10.1109/MASCOT.2004.1348305
ROSE User Manual: A Tool for Building Source-to-Source Translators (2012). http://rosecompiler.org/ROSE_UserManual/ROSE-UserManual.pdf
Satoh, S.: NAS Parallel Benchmarks 2.3 OpenMP C version [Online]. Available: http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp (2000)
Schulte, W., Tillmann, N.: Automatic parallelization of programming languages: past, present and future. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering, IWMSE ’10, pp. 1–1. ACM, New York, NY, USA (2010). doi:10.1145/1808954.1808956
Shen, Z., Li, Z., Yew, P.: An empirical study of fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1, 356–364 (1990)
Google Scholar
Tiwari, A., Chen, C., Jacqueline, C., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12. IEEE Computer Society, Washington, DC, USA (2009). doi:10.1109/IPDPS.2009.5161054. http://portal.acm.org/citation.cfm?id=1586640.1587552
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. SIGPLAN Not. 44(6), 177–187 (2009). doi:10.1145/1543135.1542496
Article Google Scholar
Vandierendonck, H., Rul, S., Bosschere, K.D.: The paralax infrastructure: automatic parallelization with a helping hand. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 389–400 (2010)
William, B., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. Computer. 29(12), 78–82 (1996)
Google Scholar
Yoo, S., Lee, H., Killian, C., Kulkarni, M.: Incontext: simple parallelism for distributed applications. In: Proceedings of the 20th International Symposium on High performance distributed computing, HPDC ’11, pp. 97–108. ACM, New York, NY, USA (2011). doi:10.1145/1996130.1996144

Download references

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN , 47907, USA
Dheya Mustafa & Rudolf Eigenmann

Authors

Dheya Mustafa
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Eigenmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dheya Mustafa.

Additional information

This work was supported, in part, by the National Science Foundation under grants No. CNS-0720471, 0707931-CNS, 0833115-CCF, and 0916817-CCF.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mustafa, D., Eigenmann, R. PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers. Int J Parallel Prog 43, 549–571 (2015). https://doi.org/10.1007/s10766-014-0307-8

Download citation

Received: 18 July 2013
Accepted: 28 February 2014
Published: 19 March 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10766-014-0307-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation