Data localization using loop aligned decomposition for macro-dataflow processing

Yoshida, Akimasa; Kasahara, Hironori

doi:10.1007/BFb0017245

Akimasa Yoshida¹ &
Hironori Kasahara¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

120 Accesses
3 Citations

Abstract

This paper proposes a data-localization compilation scheme for Fortran macro-dataflow processing on a multiprocessor system with local memory and centralized shared memory. The data-localization scheme minimizes data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory on each processor effectively. In this scheme, a compiler firstly partitions coarse-grain tasks like loops having data dependences among them and their data into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum. Secondly it generates dynamic scheduling routine which assigns decomposed tasks in a group to the same processor at run-time. Thirdly it generates parallel machine code to pass shared data inside the group through local memory. This compiler has been implemented for an multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 10% to 20% in average compared with macro-dataflow processing without data-localization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.A. Padua and M.J. Wolfe. Advanced compiler optimizations for super computers. Commun. ACM, 29(12):1184–1201, 1986.
Article Google Scholar
M. Wolfe. Optimizing supercompilers for supercomputers. MIT press, 1989.
Google Scholar
U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua. Automatic program parallelization. Proc. of IEEE, 81(2):211–243, Feb. 1993.
Article Google Scholar
D.A. Padua, D.J. Kuck, and D.H. Lawrie. High-speed multiprocessor and compilation techniques. IEEE Trans. Comput., C-29(9):763–776, 1980.
Google Scholar
M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1996.
Google Scholar
U. Banerjee. Dependence analysis for supercomputing. Kluwer Academic Pub., 1988.
Google Scholar
U. Banerjee. Loop parallelization. Kluwer Academic Pub., 1994.
Google Scholar
H. Kasahara, H. Honda, and S. Narita. Parallel processing of near fine grain tasks using static scheduling on OSCAR. IEEE ACM Supercomputing'90, 1990.
Google Scholar
A.V. Aho, R. Sethi, and J.D. Ullman. Compilers (principles, techniques, and tools). Addison Wesley, 1988.
Google Scholar
H. Honda, M. Iwata, and H. Kasahara. Coarse grain parallelism detection scheme of Fortran programs. Trans. IEICE(in Japanese), J73-D-I(12):951–960, 1990.
Google Scholar
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar. Report UIUCDCS-R-83-1123, Dept. of Computer Sci., Univ. Illinois at Urbana-Champaign, Feb. 1983.
Google Scholar
D.D. Gajski, D.J. Kuck, and D.A. Padua. Dependence driven computation. Proc. of COMPCON 81 Sprint Computer Conf., pages 168–172, 1981.
Google Scholar
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataflow computation on hierarchical multiprocessor systems. Proc. Int. Conf. on Parallel Processing, 1990.
Google Scholar
H. Honda, K. Aida, M. Okamoto, A. Yoshida, W. Ogata, and H. Kasahara. Fortran macro-dataflow compiler. Proceedings of Fourth Workshop on Compilers for Parallel Computers, pages 415–425, Dec. 1993.
Google Scholar
H. Kasahara. Parallel processing technology. Corona Pub. in Japan, 1991.
Google Scholar
H. Kasahara, H. Honda, K. Aida, M. Okamoto, and S. Narita. OSCAR Fortran compiler. Proc. Workshop on Compilation of (Symbolic) Languages for Parallel Computers in 1991 Int. Logic Programming Symposium, 1991.
Google Scholar
H. Kasahara, H. Honda, A. Mogi, A. Ogura, K.Fujiwara, and S.Narita. Multigrain parallelizing compilation scheme for OSCAR. 4th Workshop on Language and Compilers for Parallel Computing, 1991.
Google Scholar
P. Tu and D. Padua. Automatic array privatization. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
Google Scholar
R. Eigenmann. Toward a methodology of optimizing programs for high-performance computers. Proc. of ACM International Conference on Supercomputing'93, pages 27–36, Jul. 1993.
Google Scholar
Z. Li. Array privatization for parallel execution of loops. Proc. of the 1992 ACM Int. Conf. on Supercomputing, pages 313–322, 1992.
Google Scholar
High Performance Fortran Forum. High performance Fortran language specification draft ver.1.0. High Performance Fortran Forum, 1993.
Google Scholar
B. Chapman, P. Mehrotra, and H. Zima. Extending HPF for advanced data parallel applications. Proceedings of Fifth Workshop on Compilers for Parallel Computers, Jun. 1995.
Google Scholar
S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C.-W. Tseng. An overview of the Fortran D programming system. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.
Google Scholar
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed System, 2(3):361–376, 1991.
Article Google Scholar
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE trans. on parallel and distributed systems, 2(4), 1991.
Google Scholar
T.-S. Chen and J.-P. Sheu. Communication-free data allocation techniques for parallelizing compilers on mnlticomputers. IEEE trans. on parallel and distributed systems, 5(9), 1994.
Google Scholar
A. Agarwal, D. A. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. on Parallel and Distributed System, 6(9):943–962, 1995.
Article Google Scholar
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Trans. on Parallel and Distributed System, 3(2):179–193, 1992.
Article Google Scholar
D. J. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Proc. 8th Workshop on Languages and Compilers for Parallel Computing, 1995.
Google Scholar
J.M. Anderson and M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proc. of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.
Google Scholar
L. Bic, A. Nicolau, and M.Sato (ed). Parallel language and compiler research in japan. Kluwer Academic Pub., 1995.
Google Scholar
K. Aida, K. Iwasaki, H. Kasahara, and S. Narita. Performance evaluation of macro-dataflow computation on shared memory multiprocessors. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
Google Scholar
M. Okamoto, K. Yamashita, H. Kasahara, and S. Narita. Hierarchical macrodataflow computation scheme on a multiprocessor system OSCAR. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
Google Scholar
H. Honda, K. Aida, M. Okamoto, and H. Kasahara. Coarse grain parallel execution scheme of a Fortran program on OSCAR. Trans. IEICE(in Japanese), J75-D-I(8):526–535, 1992.
Google Scholar
H. Kasahara, H. Honda, and S. Narita. A multi-grain parallelizing compilation scheme for OSCAR. Proc. 4th Workshop on Language and Compilers for Parallel Computing, 1991.
Google Scholar
J. Ferrante, K.J. Ottenstein, and J.D. Warren. The program dependence graph and its use in optimization. ACM Trans. on Prog. Lang. and Syst., 9(3):319–349, 1987.
Article Google Scholar
F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. A framework for determining useful parallelism. Proc. 2nd ACM Int. Conf. on Supercomputing, 1988.
Google Scholar
M. Girkar and C.D. Polychronopoulos. Optimization of data/control conditions in task graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.
Google Scholar
M. Girkar and C.D. Polychronopoulos. Automatic extraction of functional parallelism from ordinary programs. IEEE Trans. on Parallel and Distributed System, 3(2):166–178, 1992.
Article Google Scholar
U. Banerjee. Loop transformations for restructuring compilers. Kluwer Academic Pub., 1993.
Google Scholar
A. Yoshida, K. Koshizuka, and H. Kasahara. Data-localization for fortran macrodataflow computation using partial static task assignment. Proceedings of 10th ACM International Conference on Supercomputing, pages 61–68, May. 1996.
Google Scholar
W. Ogata, K. Fujimoto, M. Oota, and H. Kasahara. Compilation scheme for near fine grain parallel processing on a multiprocessor system without explicit synchronization. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical, Electronics and Computer Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-Ku, 169, Tokyo, Japan
Akimasa Yoshida & Hironori Kasahara

Authors

Akimasa Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoshida, A., Kasahara, H. (1997). Data localization using loop aligned decomposition for macro-dataflow processing. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017245

Download citation

DOI: https://doi.org/10.1007/BFb0017245
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics