Skip to main content

Data localization using loop aligned decomposition for macro-dataflow processing

  • Automatic Data Distribution and Locality Enhancement
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Abstract

This paper proposes a data-localization compilation scheme for Fortran macro-dataflow processing on a multiprocessor system with local memory and centralized shared memory. The data-localization scheme minimizes data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory on each processor effectively. In this scheme, a compiler firstly partitions coarse-grain tasks like loops having data dependences among them and their data into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum. Secondly it generates dynamic scheduling routine which assigns decomposed tasks in a group to the same processor at run-time. Thirdly it generates parallel machine code to pass shared data inside the group through local memory. This compiler has been implemented for an multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 10% to 20% in average compared with macro-dataflow processing without data-localization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.A. Padua and M.J. Wolfe. Advanced compiler optimizations for super computers. Commun. ACM, 29(12):1184–1201, 1986.

    Article  Google Scholar 

  2. M. Wolfe. Optimizing supercompilers for supercomputers. MIT press, 1989.

    Google Scholar 

  3. U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua. Automatic program parallelization. Proc. of IEEE, 81(2):211–243, Feb. 1993.

    Article  Google Scholar 

  4. D.A. Padua, D.J. Kuck, and D.H. Lawrie. High-speed multiprocessor and compilation techniques. IEEE Trans. Comput., C-29(9):763–776, 1980.

    Google Scholar 

  5. M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1996.

    Google Scholar 

  6. U. Banerjee. Dependence analysis for supercomputing. Kluwer Academic Pub., 1988.

    Google Scholar 

  7. U. Banerjee. Loop parallelization. Kluwer Academic Pub., 1994.

    Google Scholar 

  8. H. Kasahara, H. Honda, and S. Narita. Parallel processing of near fine grain tasks using static scheduling on OSCAR. IEEE ACM Supercomputing'90, 1990.

    Google Scholar 

  9. A.V. Aho, R. Sethi, and J.D. Ullman. Compilers (principles, techniques, and tools). Addison Wesley, 1988.

    Google Scholar 

  10. H. Honda, M. Iwata, and H. Kasahara. Coarse grain parallelism detection scheme of Fortran programs. Trans. IEICE(in Japanese), J73-D-I(12):951–960, 1990.

    Google Scholar 

  11. D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar. Report UIUCDCS-R-83-1123, Dept. of Computer Sci., Univ. Illinois at Urbana-Champaign, Feb. 1983.

    Google Scholar 

  12. D.D. Gajski, D.J. Kuck, and D.A. Padua. Dependence driven computation. Proc. of COMPCON 81 Sprint Computer Conf., pages 168–172, 1981.

    Google Scholar 

  13. H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataflow computation on hierarchical multiprocessor systems. Proc. Int. Conf. on Parallel Processing, 1990.

    Google Scholar 

  14. H. Honda, K. Aida, M. Okamoto, A. Yoshida, W. Ogata, and H. Kasahara. Fortran macro-dataflow compiler. Proceedings of Fourth Workshop on Compilers for Parallel Computers, pages 415–425, Dec. 1993.

    Google Scholar 

  15. H. Kasahara. Parallel processing technology. Corona Pub. in Japan, 1991.

    Google Scholar 

  16. H. Kasahara, H. Honda, K. Aida, M. Okamoto, and S. Narita. OSCAR Fortran compiler. Proc. Workshop on Compilation of (Symbolic) Languages for Parallel Computers in 1991 Int. Logic Programming Symposium, 1991.

    Google Scholar 

  17. H. Kasahara, H. Honda, A. Mogi, A. Ogura, K.Fujiwara, and S.Narita. Multigrain parallelizing compilation scheme for OSCAR. 4th Workshop on Language and Compilers for Parallel Computing, 1991.

    Google Scholar 

  18. P. Tu and D. Padua. Automatic array privatization. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.

    Google Scholar 

  19. R. Eigenmann. Toward a methodology of optimizing programs for high-performance computers. Proc. of ACM International Conference on Supercomputing'93, pages 27–36, Jul. 1993.

    Google Scholar 

  20. Z. Li. Array privatization for parallel execution of loops. Proc. of the 1992 ACM Int. Conf. on Supercomputing, pages 313–322, 1992.

    Google Scholar 

  21. High Performance Fortran Forum. High performance Fortran language specification draft ver.1.0. High Performance Fortran Forum, 1993.

    Google Scholar 

  22. B. Chapman, P. Mehrotra, and H. Zima. Extending HPF for advanced data parallel applications. Proceedings of Fifth Workshop on Compilers for Parallel Computers, Jun. 1995.

    Google Scholar 

  23. S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C.-W. Tseng. An overview of the Fortran D programming system. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.

    Google Scholar 

  24. J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. on Parallel and Distributed System, 2(3):361–376, 1991.

    Article  Google Scholar 

  25. J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE trans. on parallel and distributed systems, 2(4), 1991.

    Google Scholar 

  26. T.-S. Chen and J.-P. Sheu. Communication-free data allocation techniques for parallelizing compilers on mnlticomputers. IEEE trans. on parallel and distributed systems, 5(9), 1994.

    Google Scholar 

  27. A. Agarwal, D. A. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. on Parallel and Distributed System, 6(9):943–962, 1995.

    Article  Google Scholar 

  28. M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Trans. on Parallel and Distributed System, 3(2):179–193, 1992.

    Article  Google Scholar 

  29. D. J. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Proc. 8th Workshop on Languages and Compilers for Parallel Computing, 1995.

    Google Scholar 

  30. J.M. Anderson and M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proc. of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.

    Google Scholar 

  31. L. Bic, A. Nicolau, and M.Sato (ed). Parallel language and compiler research in japan. Kluwer Academic Pub., 1995.

    Google Scholar 

  32. K. Aida, K. Iwasaki, H. Kasahara, and S. Narita. Performance evaluation of macro-dataflow computation on shared memory multiprocessors. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.

    Google Scholar 

  33. M. Okamoto, K. Yamashita, H. Kasahara, and S. Narita. Hierarchical macrodataflow computation scheme on a multiprocessor system OSCAR. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.

    Google Scholar 

  34. H. Honda, K. Aida, M. Okamoto, and H. Kasahara. Coarse grain parallel execution scheme of a Fortran program on OSCAR. Trans. IEICE(in Japanese), J75-D-I(8):526–535, 1992.

    Google Scholar 

  35. H. Kasahara, H. Honda, and S. Narita. A multi-grain parallelizing compilation scheme for OSCAR. Proc. 4th Workshop on Language and Compilers for Parallel Computing, 1991.

    Google Scholar 

  36. J. Ferrante, K.J. Ottenstein, and J.D. Warren. The program dependence graph and its use in optimization. ACM Trans. on Prog. Lang. and Syst., 9(3):319–349, 1987.

    Article  Google Scholar 

  37. F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. A framework for determining useful parallelism. Proc. 2nd ACM Int. Conf. on Supercomputing, 1988.

    Google Scholar 

  38. M. Girkar and C.D. Polychronopoulos. Optimization of data/control conditions in task graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, 1991.

    Google Scholar 

  39. M. Girkar and C.D. Polychronopoulos. Automatic extraction of functional parallelism from ordinary programs. IEEE Trans. on Parallel and Distributed System, 3(2):166–178, 1992.

    Article  Google Scholar 

  40. U. Banerjee. Loop transformations for restructuring compilers. Kluwer Academic Pub., 1993.

    Google Scholar 

  41. A. Yoshida, K. Koshizuka, and H. Kasahara. Data-localization for fortran macrodataflow computation using partial static task assignment. Proceedings of 10th ACM International Conference on Supercomputing, pages 61–68, May. 1996.

    Google Scholar 

  42. W. Ogata, K. Fujimoto, M. Oota, and H. Kasahara. Compilation scheme for near fine grain parallel processing on a multiprocessor system without explicit synchronization. Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yoshida, A., Kasahara, H. (1997). Data localization using loop aligned decomposition for macro-dataflow processing. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017245

Download citation

  • DOI: https://doi.org/10.1007/BFb0017245

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63091-3

  • Online ISBN: 978-3-540-69128-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics