Abstract
This paper proposes a data-localization compilation scheme for Fortran macro-dataflow processing on a multiprocessor system with local memory and centralized shared memory. The data-localization scheme minimizes data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory on each processor effectively. In this scheme, a compiler firstly partitions coarse-grain tasks like loops having data dependences among them and their data into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum. Secondly it generates dynamic scheduling routine which assigns decomposed tasks in a group to the same processor at run-time. Thirdly it generates parallel machine code to pass shared data inside the group through local memory. This compiler has been implemented for an multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 10% to 20% in average compared with macro-dataflow processing without data-localization.
Preview
Unable to display preview. Download preview PDF.
References
D.A. Padua and M.J. Wolfe. Advanced compiler optimizations for super computers. Commun. ACM, 29(12):1184–1201, 1986.
M. Wolfe. Optimizing supercompilers for supercomputers. MIT press, 1989.
U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua. Automatic program parallelization. Proc. of IEEE, 81(2):211–243, Feb. 1993.
D.A. Padua, D.J. Kuck, and D.H. Lawrie. High-speed multiprocessor and compilation techniques. IEEE Trans. Comput., C-29(9):763–776, 1980.
M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1996.
U. Banerjee. Dependence analysis for supercomputing. Kluwer Academic Pub., 1988.
U. Banerjee. Loop parallelization. Kluwer Academic Pub., 1994.
H. Kasahara, H. Honda, and S. Narita. Parallel processing of near fine grain tasks using static scheduling on OSCAR. IEEE ACM Supercomputing'90, 1990.
A.V. Aho, R. Sethi, and J.D. Ullman. Compilers (principles, techniques, and tools). Addison Wesley, 1988.
H. Honda, M. Iwata, and H. Kasahara. Coarse grain parallelism detection scheme of Fortran programs. Trans. IEICE(in Japanese), J73-D-I(12):951–960, 1990.
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar. Report UIUCDCS-R-83-1123, Dept. of Computer Sci., Univ. Illinois at Urbana-Champaign, Feb. 1983.
D.D. Gajski, D.J. Kuck, and D.A. Padua. Dependence driven computation. Proc. of COMPCON 81 Sprint Computer Conf., pages 168–172, 1981.
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataflow computation on hierarchical multiprocessor systems. Proc. Int. Conf. on Parallel Processing, 1990.
H. Honda, K. Aida, M. Okamoto, A. Yoshida, W. Ogata, and H. Kasahara. Fortran macro-dataflow compiler. Proceedings of Fourth Workshop on Compilers for Parallel Computers, pages 415–425, Dec. 1993.
H. Kasahara. Parallel processing technology. Corona Pub. in Japan, 1991.
H. Kasahara, H. Honda, K. Aida, M. Okamoto, and S. Narita. OSCAR Fortran compiler. Proc. Workshop on Compilation of (Symbolic) Languages for Parallel Computers in 1991 Int. Logic Programming Symposium, 1991.
H. Kasahara, H. Honda, A. Mogi, A. Ogura, K.Fujiwara, and S.Narita. Multigrain parallelizing compilation scheme for OSCAR. 4th Workshop on Language and Compilers for Parallel Computing, 1991.
P. Tu and D. Padua. Automatic array privatization. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
R. Eigenmann. Toward a methodology of optimizing programs for high-performance computers. Proc. of ACM International Conference on Supercomputing'93, pages 27–36, Jul. 1993.
Z. Li. Array privatization for parallel execution of loops. Proc. of the 1992 ACM Int. Conf. on Supercomputing, pages 313–322, 1992.
High Performance Fortran Forum. High performance Fortran language specification draft ver.1.0. High Performance Fortran Forum, 1993.
B. Chapman, P. Mehrotra, and H. Zima. Extending HPF for advanced data parallel applications. Proceedings of Fifth Workshop on Compilers for Parallel Computers, Jun. 1995.
S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C.-W. Tseng. An overview of the Fortran D programming system. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed System, 2(3):361–376, 1991.
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE trans. on parallel and distributed systems, 2(4), 1991.
T.-S. Chen and J.-P. Sheu. Communication-free data allocation techniques for parallelizing compilers on mnlticomputers. IEEE trans. on parallel and distributed systems, 5(9), 1994.
A. Agarwal, D. A. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. on Parallel and Distributed System, 6(9):943–962, 1995.
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Trans. on Parallel and Distributed System, 3(2):179–193, 1992.
D. J. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Proc. 8th Workshop on Languages and Compilers for Parallel Computing, 1995.
J.M. Anderson and M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proc. of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.
L. Bic, A. Nicolau, and M.Sato (ed). Parallel language and compiler research in japan. Kluwer Academic Pub., 1995.
K. Aida, K. Iwasaki, H. Kasahara, and S. Narita. Performance evaluation of macro-dataflow computation on shared memory multiprocessors. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
M. Okamoto, K. Yamashita, H. Kasahara, and S. Narita. Hierarchical macrodataflow computation scheme on a multiprocessor system OSCAR. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
H. Honda, K. Aida, M. Okamoto, and H. Kasahara. Coarse grain parallel execution scheme of a Fortran program on OSCAR. Trans. IEICE(in Japanese), J75-D-I(8):526–535, 1992.
H. Kasahara, H. Honda, and S. Narita. A multi-grain parallelizing compilation scheme for OSCAR. Proc. 4th Workshop on Language and Compilers for Parallel Computing, 1991.
J. Ferrante, K.J. Ottenstein, and J.D. Warren. The program dependence graph and its use in optimization. ACM Trans. on Prog. Lang. and Syst., 9(3):319–349, 1987.
F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. A framework for determining useful parallelism. Proc. 2nd ACM Int. Conf. on Supercomputing, 1988.
M. Girkar and C.D. Polychronopoulos. Optimization of data/control conditions in task graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.
M. Girkar and C.D. Polychronopoulos. Automatic extraction of functional parallelism from ordinary programs. IEEE Trans. on Parallel and Distributed System, 3(2):166–178, 1992.
U. Banerjee. Loop transformations for restructuring compilers. Kluwer Academic Pub., 1993.
A. Yoshida, K. Koshizuka, and H. Kasahara. Data-localization for fortran macrodataflow computation using partial static task assignment. Proceedings of 10th ACM International Conference on Supercomputing, pages 61–68, May. 1996.
W. Ogata, K. Fujimoto, M. Oota, and H. Kasahara. Compilation scheme for near fine grain parallel processing on a multiprocessor system without explicit synchronization. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoshida, A., Kasahara, H. (1997). Data localization using loop aligned decomposition for macro-dataflow processing. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017245
Download citation
DOI: https://doi.org/10.1007/BFb0017245
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive