Support of automatic parallelization with concept comprehension

doi:10.1016/S1383-7621(98)00016-2

Journal of Systems Architecture

Volume 45, Issues 6–7, January 1999, Pages 427-439

https://doi.org/10.1016/S1383-7621(98)00016-2 Get rights and content

Abstract

Current approaches to parallelizing compilation perform a purely structural analysis of the sequential code. Conversely, a semantic analysis performing concept assignment for code sections, can support the recognition of the algorithms that the code implements. This can considerably help the parallelization process, by allowing the introduction of heuristics and an extensive pruning of the search space, and thus enabling the application of more aggressive code transformations. It can play an important role in overcoming the current limitations to Automatic Parallelization. In this paper we discuss the applicability of concept comprehension to the parallelization process, and we present a novel technique for automatic algorithmic recognition we have designed and implemented. We are currently developing a reverse engineering tool supporting the translation of sequential Fortran code into HPF, which is based on the recognition technique we have developed. Its working criteria are illustrated and discussed.

Introduction

An important approach to the programming of parallel architectures is based on the (semi-)automatic transformation of sequential code, possibly augmented with parallelization directives, into explicitly parallel code. This method is not only useful for supporting the reuse of “dusty decks”, but has also important advantages when developing an application from scratch:

•
ease of design, development, verification and debugging of sequential code,
•
existence of several environments for the support of sequential programming, and
•
portability – the compiler performs the mapping from the sequential code to a code suited to a particular parallel target architecture, including machine-specific optimizations.

The main problem of this approach is the huge complexity of the search procedure for finding a close to optimal parallel version of a given sequential code, for a given target architecture.

If the target machine is a distributed memory architecture, the parallelization procedure can be thought as composed of the following conceptual phases.

•
Selection of a parallel execution model (defined by a spawning, communication and synchronization topology among processes);
•
selection of a data distribution across the processes defined in the execution model;
•
selection of a work distribution across the processes, and consequent code decomposition and assignment to processes (possibly with replications);
•
analysis and optimization of the communication needed by nonlocal accesses.

Each phase is implemented by a series of transformations applied to the code. Current approaches try to prune the search space associated with the selection of transformations by enforcing tight constraints on the alternatives allowed in each phase, which do not depend on the code to be parallelized, or on the target architecture, and by delegating to the user some decisions.

First, the execution model is generally selected a priori. It is referred in the literature as the Single Program Multiple Data (SPMD) paradigm (8, 22in its version for distributed memory architectural model).

Secondly, a default rule (such as owner computes rule [33]) determines the work distribution, in particular within parallel loops.

Within these constraints, the data and work distribution become tightly related problems, and a solution to one of them induces a solution to the other, provided that the computation is characterized by regular accesses to data [33].

Two orthogonal strategies arises from this scenario. One of them [1]aims to the automatic selection of sequences of unimodular transformations of loop nests, leading to loop iteration distributions which are optimal with regard to the locality of accesses.

The other strategy 34, 35, 21chooses the data distribution to drive the parallelization procedure, determining the work distribution and the required communication.

Unfortunately, even after the imposition of such tight constraints, the complexity of the problem remains untractable in the general case. For instance, within the approaches that attempt to determine automatically the data distribution 26, 19, 6, 1the main open questions are: (i) to find out the portions of the code (phases) where the data distribution has to remain unchanged; (ii) to find out suitable redistributions among phases; (iii) for each phase, to find out alignments among the array dimensions, such that the cost of communications due to alignment conflicts is minimal (alignment conflict resolution); (iv) for each phase, to find out the best distributions for array dimensions, such that data locality is maximized, thus minimizing the communication overhead.

Finally, we note that the use of a fixed parallel execution model, even a very flexible and general one like the SPMD model may prevent the selection of a parallelization strategy that is best suited to the characteristics of the code to be parallelized. For instance, in the context of irregular problems, execution models alternative to SPMD have been pointed out. Several authors 4, 7, 16, 17, 18have recently argued that parallel programs can be classified as belonging to a Parallel Programming Paradigm (also called template or skeleton), based on the way processes forming the parallel program are created, synchronize and communicate, abstracting from the details of their sequential part. According to this view, the SPMD model can be viewed as one paradigm; other paradigms may provide more flexible approaches to program parallelization. The idea is that a parallelizing compiler should be able to select the paradigm that is more suited to the characteristics of the algorithm, of the target architecture and of the run-time parameters, and derive the final parallel code, applying template based code transformations [9].

In this paper we present a novel approach to support parallelizing compilation: the integration of techniques for automatic recognition of algorithmic patterns within source code (concept comprehension). It can play an important role in overcoming the current limitations to Automatic Parallelization.

In Section 2we discuss the applicability of concept comprehension techniques to the parallelization process. A novel technique we have designed for automatic algorithmic recognition is then presented (Section 3) together with a prototype tool which implements it.

We are currently developing a reverse engineering tool supporting the translation of sequential Fortran code into HPF, Migrator, which is based on the recognition technique we have developed. Its design is presented in Section 4. In Section 5the features of the procedure for automated algorithmic recognition we have implemented are exemplified through a case study, which also shows an application to code restructuring for translation of Fortran into HPF.

Section snippets

Parallelization and concept comprehension

Current approaches to parallelizing compilation perform a purely structural analysis of the sequential code. Conversely, a semantic analysis performing concept assignment [3]for code sections, can support the recognition of the algorithms that the code implements. This can considerably help the parallelization process, by allowing the introduction of heuristics and an extensive pruning of the search space, and thus enabling the application of more aggressive code transformations. More

A technique for automatic algorithmic recognition

In order to effectively enable the application of the parallelization strategies described above, the algorithm recognition technique has to be powerful enough to face and solve the following problems.

•
Syntactic variations: they consist in different possible implementations of the same algorithmic pattern, which result in the same control and data flow structure.
•
Implementation variations: an algorithmic concept represent an abstract algorithmic functionality. An instance of it can thus be an

Migrator: A tool supporting the translation of Fortran into HPF

We are currently developing a reverse engineering tool supporting the translation of sequential Fortran code into HPF, Migrator, which is based on the recognition technique described above.

Migrator will accept structured Fortran 77, Fortran 90, or partially annotated HPF and will produce fully annotated HPF code with (when applicable) insertion of calls to optimized parallel libraries.

The tool can run in a fully automatic or interactive mode. Since the automatic mode is restricted to static

Example of algorithmic recognition and application to code restructuring

In this section the features of the procedure for automated algorithmic recognition we have implemented are exemplified through a case study.

The algorithmic concept we consider is the matrix–matrix product, and the concept vector–matrix product, which is the subconcept of the previous.

We present in the following some examples of recognition of instances of the matrix–matrix product concept (and consequently the composing subconcepts matrix–vector product and dot product). We show how the

Conclusions

In this paper a hierarchical concept parsing recognition technique has been presented. Its flexibility and expressivity power to specify the hierarchy, the constraints and the relationships among concepts allow it for dealing with recognition of algorithmic concepts within optimized code, irregular computations, and in presence of code sharing, delocalization, implementation variations and other problems related to program recognition, within the context of imperative languages typically used

References (35)

J. Anderson, M. Lam, Global optimizations for parallelism and locality on scalable parallel machines, in: Proceedings...
S. Bhansali, J.R. Hagemeister, A pattern matching approach for reusing software libraries in parallel systems, in:...
T.J. Biggerstaff, The concept assignment problem in program understanding, in: Proceedings of IEEE Working Conference...
P Brinch Hansen
Model programs for computational science: A programming methodology for multicomputers
Concurrency: Practice and Experience
(1993)
P. Bose, Interactive program improvement via EAVE: An expert adviser for VEctorization, in: Proceedings of...
B. Chapman, H. Herbeck, H. Zima, Automatic support for data distribution, in: Proceedings of Sixth Distributed Memory...
M.I. Cole, Algorithmic Skeletons: Structured Management of Parallel Computation, MIT Press, Cambridge, MA,...
F. Darema et al., A single program multiple data computational model for EPEX/FORTRAN, Parallel Comput. 7 (1988)...
B. Di Martino, G. Iannello, Towards automatic parallelization through program comprehension, in: Proceedings of Third...
B. Di Martino, G. Iannello, Parallelization of nonsimultaneous iterative methods for systems of linear equations, in:...

B. Di Martino, Sviluppo di Tecniche di Riconoscimento Automatico di Algoritmi nel Codice per il Supporto alla...

B. Di Martino, B. Chapman, G. Iannello, H. Zima, Integration of Program Comprehension Techniques into the Vienna...

B. Di Martino, C.W. Kessler, Program comprehension engines for automatic parallelization: A comparative study, in:...

B. Di Martino, G. Iannello, PAP recognizer: A tool for automatic recognition of parallelizable patterns, in:...

J Ferrante et al.

The program dependence graph and its use in optimization

ACM Trans. Programming Languages Systems

(1987)

I. Foster, M. Xu, Libraries for parallel paradigm integration, in: Toward Teraflop Computing and New Grand Challenge...

I. Foster, D. Walker, Paradigms and strategies for scientific computing on distributed memory concurrent computers, in:...

Cited by (12)

HERCULES/PL: The pattern language of Hercules
2014, Proceedings of the 1st Workshop on Programming Language Evolution, PLE 2014 - co-located with ECOOP 2014
Software porting support with component-based and language neutral source code analysis
2014, International Journal of Computational Science and Engineering
Compilers, techniques, and tools for supporting programming heterogeneous many/multicore systems
2013, Large Scale Network-Centric Distributed Systems
Semantic and Algorithmic Recognition Support to Porting Software Applications to Cloud
2013, Communications in Computer and Information Science
Automatic source code transformation for GPUs based on program comprehension
2012, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
An ontology based methodology for automated algorithms recognition in source code
2010, CISIS 2010 - The 4th International Conference on Complex, Intelligent and Software Intensive Systems

View all citing articles on Scopus

Beniamino Di Martino received the M.S. degree (magna cum laude) in Physics and the Ph.D. degree in Computer Science, both from University of Naples, in 1992 and 1996, respectively. In 1994 he joined the Institute for Software Technology and Parallel Systems at the University of Vienna where he is currently a Researcher. He is Adjunct Professor of Applied Computer Science at the Electronic Engineering Faculty – Second University of Naples. He was consultant for IBM Semea and ENEA (Italian National Agency for New Technology, Energy and the Environment) in Rome, working in the area of parallelization of plasma confinement simulations for nuclear fusion. He is author of more than 20 publications in international journals and conferences. His research interests include Compiler Techniques for High Performance Computing, Program Analysis and Transformation, Reverse Engineering, Pattern Recognition, Image Analysis and Understanding. He is member of the International Association for Pattern Recognition.

Hans P. Zima studied Mathematics, Physics and Astronomy at the University of Vienna, where he received his Ph.D. degree in 1964. After working for more than eight years for German and US computer manufacturers and software companies, he accepted a research position at the University of Karlsruhe. He was appointed Professor of Computer Science at the University of Bonn, Germany, in 1975. During his tenure in Bonn, he led the development of SUPERB, the first Fortran parallelization system for distributed-memory machines. Furthermore, he performed research in the field of constraint languages at the IBM San Jose Research Laboratory in 1983–84 and initiated a cooperation with Rice University, where he spent a year in 1988–89. He is currently Professor of Applied Computer Science and Head of the Institute for Software Technology and Parallel Systems at the University of Vienna. He guided the development of Vienna Fortran, one of the major inputs for the High Performance Fortran effort and of the Vienna Fortran Compilation System (VFCS). He is the author of several books and more than 100 publications in the areas of programming languages and compilers for parallel systems. His book “Supercompilers for Parallel and Vector Computers” (co-authored with Barbara Chapman), is one of the first coherent treatments of an important emerging discipline of applied computer science. His main research interests are in the field of advanced languages, programming environments and software tools for massively parallel machines, in particular automatic parallelization, performance analysis, and knowledgebased transformation systems. He led research efforts in the context of the ESPRIT projects GENESIS, PREPARE and PPPE, and is currently involved in the ESPRIT IV LTR project HPF+.

View full text