Skip to main content
Log in

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper describes PETRA: a portable performance evaluation tool for parallelizing compilers and their individual techniques. Automatic parallelization of sequential programs combined with performance tuning is an important alternative to manual parallelization for exploiting the performance potential of today’s multicores. Given the renewed interest in autoparallelization, this paper aims at a comprehensive evaluation, identifying strengths and weaknesses in the underlying techniques. The findings allow engineers to make informed decisions about techniques to include in industrial products and direct researchers to potential improvements. We present an experimental methodology and a fully automated implementation for comprehensively evaluating the effectiveness of parallelizing compilers and their underlying optimization techniques. The methodology is the first to (1) include automatic tuning, (2) measure the performance contributions of individual techniques at multiple optimization levels, and (3) quantify the interactions of compiler optimizations. The results will also help close the gap between research compilers and industrial compilers, which are still far behind. We applied the proposed methodology using PETRA on five modern parallelizing compilers and their tuning capabilities, illustrating several use cases and applications for the evaluation tool. We report speedups, parallel coverage, and the number of parallel loops, using the NAS Benchmarks as a program suite. We found parallelizers to be reasonably successful in about half of the given science-engineering programs. An important finding is also that some techniques substitute each other. Furthermore, we found that automatic tuning can lead to significant additional performance and sometimes matches or outperforms hand-parallelized programs. Advanced versions of some of the techniques identified as most successful in previous generations of compilers are also most important today, while other techniques have risen significantly in impact. Finally, we analyze specific reasons for the measured performance and the potential for improvement of automatic parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelization. In: Proceedings of the IEEE, pp. 211–243 (1993)

  2. Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)

    Google Scholar 

  3. Callahan, D., Levine, D.: Vectorizing compilers: a test suite and results. In: SC Conference, pp. 98–105 (1988)

  4. Cavazos, J., O’Boyle, M.F.P.: Automatic tuning of inlining heuristics. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 14–14 (2005). doi:10.1109/SC.2005.14

  5. Cytron, R., Kuck, D.J., Veidenbaum, A.V.: The effect of restructing compilers on program performance for high-speed computers. Comput. Phys. Commun. 37, 39–48 (1985)

    Google Scholar 

  6. der Wijngaart, R.F.V.: NAS Parallel Benchmarks Version 2.4. Technical Report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)

  7. Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)

    Google Scholar 

  8. Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: LCPC ’09: Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (2009)

  9. Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: ICPP (2)’91, pp. 17–25 (1991)

  10. Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9, 5–23 (1998)

    Google Scholar 

  11. Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pp. 123–132 (2005). doi:10.1109/PACT.2005.9

  12. Intel C++ Compiler for Linux Systems User’s Guide. http://denali.princeton.edu/intel_cc_docs/c_ug/index.htm

  13. Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)

  14. Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? SIGPLAN Not. 44(4), 3–14 (2009). doi:10.1145/1594835.1504181

    Article  Google Scholar 

  15. Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP ’12, pp. 410–419. IEEE Computer Society, Washington, DC, USA (2012). doi:10.1109/ICPP.2012.48

  16. Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. Pract. Exp. 19(18), 2317–2332 (2007)

    Article  Google Scholar 

  17. Maleki, S., Gao, Y., Garzaran, M., Wong, T., Padua, D.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp. 372–382 (2011). doi:10.1109/PACT.2011.68

  18. McKinley, K.S., Singhai, S.K., Weaver, G.E., Weems, C.C.: Compiling for Heterogeneous System: A Survey and an Approach. Technical Report. Amherst, MA, USA (1995)

  19. Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Artificial Intelligence: Methodology, Systems, Applications, pp. 41–50. Springer, Berlin (2002)

  20. Mustafa, D., Auranzeb, Eigenmann, R.: Performance analysis and tuning of automatically parallelized openmp applications. In: Proceedings of the International Workshop on OpenMP, IWOMP (2011)

  21. Mustafa, D., Eigenmann, R.: Portable section-level tuning of compiler parallelized applications. In: SC’12: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing. IEEE Press (2012). http://engineering.purdue.edu/paramnt/publications/SC12_DHEYA.pdf

  22. Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing fortran compilers. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, pp. 820–825 (1989)

  23. OpenMP [Online]. Available: http://openmp.org/wp/

  24. OpenUH Compiler Suite : User Guide (2007). http://www2.cs.uh.edu/openuh/OpenUHUserGuide.pdf

  25. O’Boyle, M.F.P., Bull, J.M.: Expert programmer versus parallelizing compiler: a comparative study of two approaches for distributed shared memory. Sci. Program. 5(1), 63–88 (1996)

    Google Scholar 

  26. PGI Compiler Reference Manual: Parallel Fortran, C and C++ for Scientists and Engineers (2012). http://www.pgroup.com/doc//pgiref.pdf

  27. PGI Compiler user’s Guide (2012). http://www.pgroup.com/doc/pgiug.pdf

  28. Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: First International Workshop on OpenMP, pp. 24–35 (2005)

  29. Pan, Z., Eigenmann, R.: Rating compiler optimizations for automatic performance tuning. In: SC2004: High Performance Computing, Networking and Storage Conference, (10 pages) (2004). http://engineering.purdue.edu/paramnt/publications/sc04.pdf

  30. Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: The 4th Annual International Symposium on Code Generation and Optimization (CGO), pp. 319–330 (2006). http://engineering.purdue.edu/paramnt/publications/PE06_Orchestration.pdf

  31. Pan, Z., Eigenmann, R.: Peak—a fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30, 1–43 (2008)

    Google Scholar 

  32. Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pp. 12–25. ACM, New York, NY, USA (2011). doi:10.1145/1993498.1993501

  33. Pinkers, R.P.J., Knijnenburg, P.M.W., Haneda, M., Wijshoff, H.A.G.: Statistical selection of compiler options. In: Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings. The IEEE Computer Society’s 12th Annual International Symposium on, pp. 494–501 (2004). doi:10.1109/MASCOT.2004.1348305

  34. ROSE User Manual: A Tool for Building Source-to-Source Translators (2012). http://rosecompiler.org/ROSE_UserManual/ROSE-UserManual.pdf

  35. Satoh, S.: NAS Parallel Benchmarks 2.3 OpenMP C version [Online]. Available: http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp (2000)

  36. Schulte, W., Tillmann, N.: Automatic parallelization of programming languages: past, present and future. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering, IWMSE ’10, pp. 1–1. ACM, New York, NY, USA (2010). doi:10.1145/1808954.1808956

  37. Shen, Z., Li, Z., Yew, P.: An empirical study of fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1, 356–364 (1990)

    Google Scholar 

  38. Tiwari, A., Chen, C., Jacqueline, C., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12. IEEE Computer Society, Washington, DC, USA (2009). doi:10.1109/IPDPS.2009.5161054. http://portal.acm.org/citation.cfm?id=1586640.1587552

  39. Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. SIGPLAN Not. 44(6), 177–187 (2009). doi:10.1145/1543135.1542496

    Article  Google Scholar 

  40. Vandierendonck, H., Rul, S., Bosschere, K.D.: The paralax infrastructure: automatic parallelization with a helping hand. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 389–400 (2010)

  41. William, B., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. Computer. 29(12), 78–82 (1996)

    Google Scholar 

  42. Yoo, S., Lee, H., Killian, C., Kulkarni, M.: Incontext: simple parallelism for distributed applications. In: Proceedings of the 20th International Symposium on High performance distributed computing, HPDC ’11, pp. 97–108. ACM, New York, NY, USA (2011). doi:10.1145/1996130.1996144

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dheya Mustafa.

Additional information

This work was supported, in part, by the National Science Foundation under grants No. CNS-0720471, 0707931-CNS, 0833115-CCF, and 0916817-CCF.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mustafa, D., Eigenmann, R. PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers. Int J Parallel Prog 43, 549–571 (2015). https://doi.org/10.1007/s10766-014-0307-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-014-0307-8

Keywords

Navigation