The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

Bae, Hansang; Mustafa, Dheya; Lee, Jae-Woo; Aurangzeb; Lin, Hao; Dave, Chirag; Eigenmann, Rudolf; Midkiff, Samuel P.

doi:10.1007/s10766-012-0211-z

The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

Published: 10 August 2012

Volume 41, pages 753–767, (2013)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Hansang Bae¹,
Dheya Mustafa¹,
Jae-Woo Lee¹,
Aurangzeb¹,
Hao Lin¹,
Chirag Dave²,
Rudolf Eigenmann¹ &
…
Samuel P. Midkiff¹

576 Accesses
26 Citations
Explore all metrics

Abstract

This paper provides an overview and an evaluation of the Cetus source-to-source compiler infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler for research in automatic parallelization of C programs. In meantime, Cetus has been used for many additional program transformation tasks. It serves as a compiler infrastructure for many projects in the US and internationally. Recently, Cetus has been supported by the National Science Foundation to build a community resource. The compiler has gone through several iterations of benchmark studies and implementations of those techniques that could improve the parallel performance of these programs. These efforts have resulted in a system that favorably compares with state-of-the-art parallelizers, such as Intel’s ICC. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data. We will discuss and evaluate several techniques that support dynamic optimization decisions. Finally, as there is an extensive body of proposed compiler analyses and transformations for parallelization, the question of the importance of the techniques arises. This paper evaluates the impact of the individual Cetus techniques on overall program performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Efficient High-Level Programming in Plain Java

Article 05 December 2022

References

Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)
Google Scholar
Asenjo, R., Castillo, R., Corbera, F., Navarro, A., Tineo, A., Zapata, E.: Parallelizing irregular C codes assisted by interprocedural shape analysis. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08) (2008)
Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The opentm transactional application programming interface. In: PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 376–387. IEEE Computer Society, Washington, DC, USA (2007). doi:10.1109/PACT.2007.74
Barszcz, E., Barton, J., Dagum, L., Frederickson, P., Lasinski, T., Schreiber, R., Venkatakrishnan, V., Weeratunga, S., Bailey, D., Bailey, D., Browning, D., Browning, D., Carter, R., Carter, R., Fineberg, S., Fineberg, S., Simon, H., Simon, H.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. Technical report (1991)
Basumallik, A., Eigenmann, R.: Optimizing irregular shared-memory applications for distributed-memory systems. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 119–128. ACM, New York, NY, USA (2006). doi:10.1145/1122971.1122990
Blume W., Doallo R., Eigenmann R., Grout J., Hoeflinger J., Lawrence T., Lee J., Padua D., Paek Y., Pottenger B., Rauchwerger L., Tu P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)
Article Google Scholar
Blume W., Eigenmann R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3(1), 643–656 (1992)
Article Google Scholar
Blume, W., Eigenmann, R.: The range test: a dependence test for symbolic, non-linear expressions. In: Proceedings of Supercomputing ’94, Washington, DC, pp. 528–537 (1994)
Callahan, D., Dongarra, J., Levine D.: Vectorizing compilers: a test suite and results. In: Proceedings of the 1988 ACE/IEEE Conference on Supercomputing, Orlando, FL, USA, pp. 98–105. IEEE Computer Society Press, Los Alamitos, CA (1988)
Callahan, D.: The program summary graph and flow-sensitive interprocedual data flow analysis. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, PLDI ’88, pp. 47–56. ACM, New York, NY, USA (1988). doi:10.1145/53990.53995
Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011 (2011)
Dave, C.: Parallelization and performance-tuning: automating two essential techniques in the multicore era. Master’s thesis, Purdue University (2010)
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)
Google Scholar
Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: Proceedings of the International Conference on Parallel Processing, vol. 2, pp. 17–25 (1991)
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)
Google Scholar
Emami, M., Ghiya, R., Hendren, L.J.: Context-sensitive interprocedural points-to analysis in the presence of function pointers. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI ’94, pp. 242–256. ACM, New York, NY, USA (1994). doi:10.1145/178243.178264
Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI ’06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM, New York, NY, USA (2006). doi:10.1145/1133981.1133992
Guo, J., Stiles, M., Yi, Q., Psarris, K.: Enhancing the role of inlining in effective interprocedural parallelization. In: Parallel Processing (ICPP), 2011 International Conference on, pp. 265–274 (2011). doi:10.1109/ICPP.2011.68
Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the ACM Symposium on Principles and Practice of Parallel Programming (PPOPP’09), ACM Press (2009)
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–10. IEEE Computer Society, Washington, DC, USA (2009) doi:10.1109/IPDPS.2009.5160988. http://portal.acm.org/citation.cfm?id=1586640.1587597
Min, S.J., Kim, S.W., Voss, M., Lee, S.I., Eigenmann, R.: Portable compilers for OpenMP. In: OpenMP Shared-Memory Parallel Programming, Lecture Notes in Computer Science #2104, pp. 11–19. Springer, Heidelberg (2001)
Mustafa, D., Eigenmann, R.: Portable section-level tuning of compiler parallelized applications. In: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing. IEEE Press (2012)
Mustafa, D., Eigenmann, R.: Window-based empirical tuning of parallelized applications. Technical report, Purdue University, ParaMount Research Group (2011)
Mytkowicz T., Diwan A., Hauswirth M., Sweeney P.: The effect of omitted-variable bias on the evaluation of compiler optimizations. Computer 43(9), 62–67 (2010). doi:10.1109/MC.2010.214
Article Google Scholar
Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing Fortran compilers. In: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pp. 820–825 (1989)
Papakonstantinou, A., Gururaj, K., Stratton, J.A., Chen, D., Cong, J., Hwu, W.M.W.: High-performance CUDA kernel execution on FPGAs. In: Proceedings of the 23rd International Conference on Supercomputing, ICS ’09, pp. 515–516. ACM, New York, NY, USA (2009). doi:10.1145/1542275.1542357
Satoh, S.: NAS Parallel Benchmarks 2.3 OpenMP C version [Online]. Available: http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp(2000)
Shen Z., Li Z., Yew P.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1(3), 356–364 (1990)
Article Google Scholar
Tu, P., Padua, D.: Automatic array privatization. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua D. (eds.) Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 768, pp. 500–521, Portland (12–14 August 1993)
der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Technical report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)
Wolfe M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)
MATH Google Scholar
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 86–97. ACM, New York, NY, USA (2010). doi:10.1145/1806596.1806606
Yang, Y., Xiang, P., Kong, J., Zhou, H.: An optimizing compiler for GPGPU programs with input-data sharing. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 343–344. ACM, New York, NY, USA (2010). doi:10.1145/1693453.1693505

Download references

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN, 47907, USA
Hansang Bae, Dheya Mustafa, Jae-Woo Lee, Aurangzeb, Hao Lin, Rudolf Eigenmann & Samuel P. Midkiff
Qualcomm, San Diego, CA, 92121, USA
Chirag Dave

Authors

Hansang Bae
View author publications
You can also search for this author in PubMed Google Scholar
Dheya Mustafa
View author publications
You can also search for this author in PubMed Google Scholar
Jae-Woo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Aurangzeb
View author publications
You can also search for this author in PubMed Google Scholar
Hao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chirag Dave
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Eigenmann
View author publications
You can also search for this author in PubMed Google Scholar
Samuel P. Midkiff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hansang Bae.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bae, H., Mustafa, D., Lee, JW. et al. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation. Int J Parallel Prog 41, 753–767 (2013). https://doi.org/10.1007/s10766-012-0211-z

Download citation

Received: 24 January 2012
Accepted: 23 July 2012
Published: 10 August 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10766-012-0211-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation