Abstract
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored graph. We present a technique that records the run time inter-node communication caused by the movement of array data between nodes during execution and builds the colored graph, and provide a simple algorithm that optimizes the coloring of this graph to describe new data distributions that would result in less inter-node communication. From the distribution information, we write compiler pragmas to be used in the application program.
Using these techniques, we traced the execution of a real data-parallel application (written in CM Fortran) and collected the array access information. We computed new distributions that should provide an overall reduction in program execution time. However, compiler optimizations and poor interfaces between the compiler and runtime systems counteracted any potential benefit from the new data layouts. In this context, we provide a set of recommendations for compiler writers that we think are needed to both write efficient programs and to build the next generation of tools for parallel systems.
The techniques that we have developed form the basis for future work in monitoring array access patterns and generate on-the-fly redistributions of arrays.
This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Material Command, USAF, under grant F33615-94-1-1525 (ARPA order no. B550), NSF Grants CCR-9100968 and CDA-9024618, Department of Energy Grant DE-FG02-93ER25176, and Office of Naval Research Grant N00014-89-J-1222. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.
Preview
Unable to display preview. Download preview PDF.
References
W. D. Hillis and G. L. Steele, Data Parallel Algorithms, Communications of the ACM, December 1986, 1170–1183.
CMFortran Reference Manual (Online document), Thinking Machines Corp. Version 2.2.1-2.
C*: C-star Reference Manual (Online document), Thinking Machines Corp. Version 7.1.
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kramer and C. Tseng, Fortran-D Language Specification, Technical Report, Computer TR90-141, Rice University, 1990.
High Performance Fortran Language Specification, High Performance Fortran Forum Version 1.0 (May 1993).
U. Kremer, J. Mellor-Crummey, K. Kennedy and A. Carle, Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment, Technical Report CRPC-TR93-298-S, Rice University,.
A. Rogers and K. Pingali, Process Decomposition Through Locality of Reference, Proc. of the 1989 Conf. on Programming Language Design and Implementation, Portland, Oregon, June 1989, 69–80.
U. Kremer, NP-Completeness of Dynamic Remapping, Proceedings of the Fourth International Workshop on Compilers for Parallel Computers, December 1993, 135–141.
L. D. Whitley, Foundations of Genetic Algorithms, M. Kaufmann Publishers, San Mateo, California, 1993.
D. S. Johnson, C. R. Aragon, L. A. McGeoch and C. Schevon, Optimization by Simulated Annealing: An Experimental Evaluation, Operations Research 39, 3 (May–June 1991), 378–406.
J. R. Evans and E. Minieka, Optimization Algorithms for Networks and Graphs, M. Dekker, New York, 1992.
B. H. McCormick, T. A. DeFanti and M. D. Brown, Visualization in Scientific Computing, Computer Graphics 21, 6 (November 1987).
J. R. Larus and T. Ball, Rewriting Executable Files to Measure Program Behavior, Software-Practice & Experience 24, 2 (Feb, 1994), 197–218.
J. K. Hollingsworth, B. P. Miller and J. Cargille, Dynamic Program Instrumentation for Scalable Performance Tools, 1994 Scalable High-Performance Computing Conf., Knoxville, Tenn., 1994.
B. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell Systems Technical Journal 49 (1970), 291–307.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kunchithapadam, K., Miller, B.P. (1995). Optimizing array distributions in data-parallel programs. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025897
Download citation
DOI: https://doi.org/10.1007/BFb0025897
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58868-9
Online ISBN: 978-3-540-49134-7
eBook Packages: Springer Book Archive