Abstract
A source-to-source compiler is a type of translator that accepts the source code of a program written in a programming language as its input and produces an equivalent source code in the same or different programming language. S2S techniques are commonly used to enable fluent translation between high-level programming languages, to perform large-scale refactoring operations, and to facilitate instrumentation for dynamic analysis. Negative perceptions about S2S’s applicability in High Performance Computing (HPC) are studied and evaluated here. This is a first study that brings to light reasons why scientists do not use source-to-source techniques for HPC. The primary audience for this paper are those considering S2S technology in their HPC application work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adamski, D., Szydłowski, M., Jabłoński, G., Lasoń, J.: Dynamic tiling optimization for polly compiler. Int. J. Microelectron. Comput. Sci. 8(4) (2017)
Ahmed, H., Skjellum, A., Bangalore, P., Pirkelbauer, P.: Transforming blocking MPI collectives to non-blocking and persistent operations. In: Proceedings of the 24th European MPI Users’ Group Meeting. EuroMPI 2017. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3127024.3127033
Appel, A.W.: Modern Compiler Implementation in C. Cambridge University Press, Cambridge (2004)
Ayres, D.L., Cummings, M.P.: Heterogeneous hardware support in BEAGLE, a high-performance computing library for statistical phylogenetics. In: 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pp. 23–32. IEEE (2017)
Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos Mercurium: a research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP, vol. 8, p. 56 (2004)
Besnard, J.B., et al.: Introducing task-containers as an alternative to runtime-stacking. In: Proceedings of the 23rd European MPI Users’ Group Meeting, pp. 51–63 (2016)
Capodieci, N., Cavicchioli, R., Bertogna, M., Paramakuru, A.: Deadline-based scheduling for GPU with preemption support. In: 2018 IEEE Real-Time Systems Symposium (RTSS), pp. 119–130. IEEE (2018)
Castro, P.D.O., Akel, C., Petit, E., Popov, M., Jalby, W.: Cere: LLVM-based codelet extractor and replayer for piecewise benchmarking and optimization. ACM Trans. Arch. Code Optim. (TACO) 12(1), 1–24 (2015)
Chen, Y.: Software simultaneous multithreading through compilation. Ph.D. thesis, University of Delaware (2018)
Cingolani, D., Pellegrini, A., Schordan, M., Quaglia, F., Jefferson, D.R.: Dealing with reversibility of shared libraries in PDES. In: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. SIGSIM-PADS 2017, pp. 41–52. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3064911.3064927
Cores, I., Rodríguez, G., González, P., Martín, M.J.: Failure avoidance in MPI applications using an application-level approach. Comput. J. 57(1), 100–114 (2014)
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 42(12), 36–42 (2009)
Degomme, A., Legrand, A., Markomanolis, G.S., Quinson, M., Stillwell, M., Suter, F.: Simulating MPI applications: the SMPI approach. IEEE Trans. Parallel Distrib. Syst. 28(8), 2387–2400 (2017)
Denis, C., Castro, P.D.O., Petit, E.: Verificarlo: checking floating point accuracy through Monte Carlo arithmetic. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 55–62. IEEE (2016)
Diener, M., White, S., Kale, L.V., Campbell, M., Bodony, D.J., Freund, J.B.: Improving the memory access locality of hybrid MPI applications. In: Proceedings of the 24th European MPI Users’ Group Meeting. EuroMPI 2017. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3127024.3127038
Fukuda, K., Matsuda, M., Maruyama, N., Yokota, R., Taura, K., Matsuoka, S.: Tapas: an implicitly parallel programming framework for hierarchical n-body algorithms. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp. 1100–1109. IEEE (2016)
Gosselin, J., Wang, A., Pirkelbauer, P., Liao, C., Yan, Y., Dechev, D.: Extending freecompilercamp.org as an onlineself-learning platform for compiler development. In: Workshop on Education for High Performance Computing (EduHPC-20), November 2020
Gschwandtner, P., Durillo, J.J., Fahringer, T.: Multi-objective auto-tuning with Insieme: optimization and trade-off analysis for time, energy and resource usage. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 87–98. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_8
Harel, R., Mosseri, I., Levin, H., Alon, L.O., Rusanovsky, M., Oren, G.: Source-to-source parallelization compilers for scientific shared-memory multi-core and accelerated multiprocessing: analysis, pitfalls, enhancement and potential. Int. J. Parallel Program. 48(1), 1–31 (2020)
Holland, G.: Abstracting OpenCL for multi-application workloads on CPU-FPGA clusters. Ph.D. thesis, Applied Sciences: School of Engineering Science (2019)
Hollman, D.S., Bennett, J.C., Kolla, H., Lifflander, J., Slattengren, N., Wilke, J.: Metaprogramming-enabled parallel execution of apparently sequential C++ code. In: 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2), pp. 24–31 (2016)
Huck, K.A., Malony, A.D., Shende, S., Jacobsen, D.W.: Integrated measurement for cross-platform OpenMP performance analysis. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 146–160. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_11
Jordan, H.: Insieme-a compiler infrastructure for parallel programs. Ph.D. thesis, Ph. D. dissertation, University of Innsbruck (2014)
Khammassi, N.: High-level structured programming models for explicit and automatic parallelization on multicore architectures. Ph.D. thesis, Université de Bretagne Sud (2014)
Kim, J., Lee, Y.J., Park, J., Lee, J.: Translating OpenMP device constructs to OpenCL using unnecessary data transfer elimination. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 597–608. IEEE (2016)
Komatsu, K., Egawa, R., Hirasawa, S., Takizawa, H., Itakura, K., Kobayashi, H.: Migration of an atmospheric simulation code to an OpenACC platform using the xevolver framework. In: 2015 Third International Symposium on Computing and Networking (CANDAR), pp. 515–520. IEEE (2015)
Kruse, M., Grosser, T.: DeLICM: scalar dependence removal at zero memory cost. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp. 241–253 (2018)
Lattner, C.: LLVM and clang: next generation compiler technology. In: The BSD Conference, vol. 5 (2008)
Lattner, C., et al.: MLIR: a compiler infrastructure for the end of Moore’s law. arXiv e-prints, pp. arXiv-2002 (2020)
Li, J., Guo, B., Shen, Y., Li, D., Huang, Y.: Kernel scheduling approach for reducing GPU energy consumption. J. Comput. Sci. 28, 360–368 (2018)
Lidman, J., Quinlan, D.J., Liao, C., McKee, S.A.: Rose::fttransform - a source-to-source translation framework for exascale fault-tolerance research. In: IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012), pp. 1–6 (2012)
Luley, R.S., Qiu, Q.: Effective utilization of CUDA Hyper-Q for improved power and performance efficiency. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1160–1169. IEEE (2016)
Macià, S., Martínez-Ferrer, P.J., Mateo, S., Beltran, V., Ayguadé, E.: Assembling a high-productivity DSL for computational fluid dynamics. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11 (2019)
Majeti, D., Meel, K.S., Barik, R., Sarkar, V.: Automatic data layout generation and kernel mapping for CPU+GPU architectures. In: Proceedings of the 25th International Conference on Compiler Construction, pp. 240–250 (2016)
Masnada, S.: Semi-automatic performance optimization of HPC kernels. Ph.D. thesis, Université Grenoble Alpes (2016)
McCormick, P., et al.: Exploring the construction of a domain-aware toolchain for high-performance computing. In: 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, pp. 1–10. IEEE (2014)
Medina, D.: Okl: a unified language for parallel architectures. Technical report, TR15-04, Rice University, June 2015
Menon, H., et al.: Adapt: algorithmic differentiation applied to floating-point precision tuning. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 614–626. IEEE (2018)
Meyer, X., Chopard, B., Salamin, N.: Scheduling finite difference approximations for DAG-modeled large scale applications. In: Proceedings of the Platform for Advanced Scientific Computing Conference. PASC 2017. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3093172.3093231
Milewicz, R., Vanka, R., Tuck, J., Quinlan, D., Pirkelbauer, P.: Runtime checking C programs. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC 2015, pp. 2107–2114. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2695664.2695906
Milic, U., et al.: Beyond the socket: NUMA-aware GPUs. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 123–135 (2017)
Novillo, D.: SamplePGO - the power of profile guided optimizations without the usability burden. In: 2014 LLVM Compiler Infrastructure in HPC, pp. 22–28 (2014)
Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: TuCCompi: a multi-layer model for distributed heterogeneous computing with tuning capabilities. Int. J. Parallel Prog. 43(5), 939–960 (2015)
Palkowski, M., Bielecki, W.: TRACO: source-to-source parallelizing compiler. Comput. Inform. 35(6), 1277–1306 (2016)
Penuchot, J., Falcou, J., Khabou, A.: Modern generative programming for optimizing small matrix-vector multiplication. In: 2018 International Conference on High Performance Computing and Simulation (HPCS), pp. 508–514. IEEE (2018)
Quinlan, D., Liao, C.: The ROSE source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in Conjunction with PACT, vol. 2011, p. 1. Citeseer (2011)
Sangaiah, K., et al.: Synchrotrace: synchronization-aware architecture-agnostic traces for lightweight multicore simulation of CMP and HPC workloads. ACM Trans. Arch. Code Optim. (TACO) 15(1), 1–26 (2018)
Shen, D., Song, S.L., Li, A., Liu, X.: CudaAdvisor: LLVM-based runtime profiling for modern GPUs. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization. CGO 2018, pp. 214–227. ACM, New York (2018). https://doi.org/10.1145/3168831
Sulyok, A.A., Balogh, G.D., Reguly, I.Z., Mudalige, G.R.: Improving locality of unstructured mesh algorithms on GPUs. arXiv preprint arXiv:1802.03749 (2018)
Torczon, L., Cooper, K.: Engineering A Compiler, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 191–202. IEEE (2014)
Weber, N., Goesele, M.: MATOG: array layout auto-tuning for CUDA. ACM Trans. Archit. Code Optim. 14(3) (2017). https://doi.org/10.1145/3106341
Yilmaz, B.: Runtime specialization and autotuning of sparse matrix-vector multiplication. Ph.D. thesis, Ph. D. dissertation, Ozyegin University (2015)
Acknowledgements
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. Images used by permission. SAND2021-9377 C.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-CONF-821299.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Milewicz, R., Pirkelbauer, P., Soundararajan, P., Ahmed, H., Skjellum, T. (2021). Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review. In: Jagode, H., Anzt, H., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12761. Springer, Cham. https://doi.org/10.1007/978-3-030-90539-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-90539-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90538-5
Online ISBN: 978-3-030-90539-2
eBook Packages: Computer ScienceComputer Science (R0)