Abstract
Massively parallel architectures are mainly based on a parallel heterogeneous setup. They are composed by different computing devices that speed up specific code regions, named kernels. These kernels are usually executed offline in the corresponding devices. Porting applications to a specific heterogeneous platform is a costly task in terms of time and human resources. The key points in the porting process are the manual analysis of the source code and kernel detection. Each device of these heterogeneous platforms has their own restrictions, such as the memory allocation support. Kernels must be mapped with suitable computing devices. We introduced AKI as an automatic kernel identification and annotation tool that aims to identify potential kernels on C\(++\) sequential applications. AKI identifies those kernels that can be offlined on heterogeneous computing devices. To annotate these kernels, REPARA C++ attributes have been defined. This annotation mechanism can aid future automatic source-to-source transformation tools to facilitate the work for parallel heterogeneous platforms. AKI has been evaluated over all benchmarks included in the NAS suite. The benchmark suite incorporates a big set of realistic high performance applications. The evaluation results demonstrate that AKI is a competitive solution for identifying and annotating parallel code fragments (aka kernels).
Similar content being viewed by others
References
Athavale A, Ranadive P, Babu MN, Pawar P, Sah S, Vaidya V, Rajguru C (2012) Automatic sequential to parallel code conversion the S2P tool and performance analysis. GSTF J Comput (JoC) 1(4)
Brown C, Janjic V, Hammond K, Schner H, Idrees K, Glass C (2014) Agricultural reform: more efficient farming using advanced parallel refactoring tools. In: 22nd Euromicro international conference on parallel, distributed, and network-based processing
Binkley D (2007) Source code analysis: a road map. In: Future of software engineering, FOSE ’07. IEEE Computer Society, Washington, DC, pp 104–119
Bozó I, Fordós V, Horvath Z, Tóth M, Horpácsi D, Kozsik T, Köszegi J, Barwell A, Brown C, Hammond K (2014) Discovering parallel pattern candidates in Erlang. In: 13th ACM SIGPLAN workshop on Erlang, Erlang ’14. ACM, New York, pp 13–23
Brown C, Hammond K, Danelutto M, Kilpatrick P, Schöner H, Breddin T (2013) Paraphrasing: generating parallel programs using refactoring. In: Beckert B, Damiani F, de Boer F, Bonsangue MM (eds) Formal methods for components and objects, LNCS, vol 7542. Springer, Berlin, pp 237–256
Castro PDO, Akel C, Petit E, Popov M, Jalby W (2015) CERE: LLVM-based Codelet Extractor and REplayer for piecewise benchmarking and optimization. ACM Trans Archit Code Optim 12:6:1–6:24
Cevelop (2016) The C\(++\) IDE for professional developers. https://www.cevelop.com/. Accessed 5 Apr 2016
Göhringer D, Tepelmann J (2014) An interactive tool based on polly for detection and parallelization of loops. In: Workshop on parallel programming and run-time management techniques for many-core architectures and design tools and architectures for multicore embedded computing platforms, PARMA-DITAM ’14. ACM, New York, pp 1:1–1:6
González-Vélez H, Leyton M (2010) A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw Pract Exp 40(12):1135–1160
Herb Sutter: welcome to the jungle (2012). http://herbsutter.com/welcome-to-the-jungle/. Accessed 6th May 2015
ISO/IEC (2011) Information technology—programming languages—C++. In: International standard ISO/IEC 14882:20111. ISO/IEC, Geneva
Jin H, Jost G, Yan J, Ayguade E, Gonzalez M, Martorell X (2003) Automatic multilevel parallelization using OpenMP. Sci Program 11(2):177–190
Kevin H, Allan P, Nick C, Hartmut K, Malony Allen D, Thomas S, Rob F (2015) An autonomic performance environment for exascale. In: Supercomputing frontiers and innovations, pp 49–66
Lee S, Vetter JS (2014) OpenARC: Open accelerator research compiler for directive-based, efficient heterogeneous computing. In: 23rd international symposium on high-performance parallel and distributed computing, HPDC ’14. ACM, New York, pp 115–120
Li Z, Atre R, Ul-Huda Z, Jannesari A, Wolf F (2015) Discopop: a profiling tool to identify parallelization opportunities. In: Tools for high performance computing 2014, chap 3. Springer, New York, , pp 37–54
Lattner C (2008) LLVM and Clang: next generation compiler technology. In: The BSD conference, pp 1–2
McCool M, Reinders J, Robison A (2012) Structured parallel programming: patterns for efficient computation, 1st edn. Morgan Kaufmann, San Francisco
REPARA FP-7 European Project (2015). http://repara-project.eu/. Accessed 1 Apr 2015
Sotomayor R, Sanchez LM, Garcia Blas J, Calderon A, Fernandez J (2015) AKI: automatic kernel identification and annotation tool based on C\(++\) attributes. In: IEEE TrustCom-BigDataSE-ISPA 2015, pp 148–156
Seo S, Jo G, Lee J (2011) Performance characterization of the NAS parallel benchmarks in OpenCL. In: 2011 IEEE international symposium on workload characterization (IISWC), pp 137–148
Torquati M, Vanneschi M, Amini M, Guelton S, Keryell R, Lanore V, Pasquier FX, Barreteau M, Barrère R, Petrisor CT et al (2012) An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded world conference
Tournavitis G, Franke B (2010) Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. PACT 2010:377–388
Vandierendonck H, Rul S, De Bosschere K (2010) The paralax infrastructure: automatic parallelization with a helping hand. PACT 2010:389–400
Acknowledgments
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 609666 and by the Spanish Ministry of Economics and Competitiveness under the Grant TIN2013-41350-P.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
del Rio Astorga, D., Sotomayor, R., Sanchez, L.M. et al. Assessing and discovering parallelism in C\(++\) code for heterogeneous platforms. J Supercomput 74, 5674–5689 (2018). https://doi.org/10.1007/s11227-016-1794-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1794-8