Skip to main content

Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2017))

Abstract

This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by “Earliest Executable Condition Analysis” considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.

    Google Scholar 

  2. U. Banerjee. Loop Parallelization. Kluwer Academic Pub., 1994.

    Google Scholar 

  3. U. Barnerjee. Dependence Analysis for Supercomputing. Kluwer Pub., 1989.

    Google Scholar 

  4. P. Petersen and D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. Proc. Int’l conf. on supemputing, Jun. 1993.

    Google Scholar 

  5. W. Pugh. The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Alysis. Proc. Supercomputing’91, 1991.

    Google Scholar 

  6. M. R. Haghighat and C. D. Polychronopoulos. Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, 1995.

    Google Scholar 

  7. P. Tu and D. Padua. Automatic Array Privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.

    Google Scholar 

  8. M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.

    Google Scholar 

  9. D. Padua and M. Wolfe. Advanced Compiler Optimizations for Supercomputers. C.ACM, 29(12):1184–1201, Dec. 1986.

    Article  Google Scholar 

  10. Polaris. http://polaris.cs.uiuc.edu/polaris/.

  11. R. Eigenmann, J. Hoeflinger, and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.

    Google Scholar 

  12. L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-Time Methods for Parallelizing Partially Parallel Loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137–146, Jul. 1995.

    Google Scholar 

  13. M. W. Hall, B. R. Murp hy, S. P. Amarasinghe, S. Liao,, and M. S. Lam. Interprocedural Parallelization Analysis: A Case Study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (LCPC95), Aug. 1995.

    Google Scholar 

  14. M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 1996.

    Google Scholar 

  15. S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM conference on parallel processing for scientific computing, 1995.

    Google Scholar 

  16. M. S. Lam. Locallity Optimizations for Parallel Machines. Third Joint International Conference on Vector and Parallel Processing, Nov. 1994.

    Google Scholar 

  17. J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and Computation Transformations for Multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.

    Google Scholar 

  18. H. Han, G. Rivera, and C.-W. Tseng. Software Support for Improving Locality in Scientific Codes. 8th Workshop on Compilers for Parallel Computers (CPC'2000), Jan. 2000.

    Google Scholar 

  19. G. Rivera and C.-W. Tseng. Locality Optimizations for Multi-Level Caches. Super Computing’ 99, Nov. 1999.

    Google Scholar 

  20. A. Yoshida, K. Koshizuka, M. Okamoto, and H. Kasahara. A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing. Trans. of IPSJ, 40(5), May. 1999.

    Google Scholar 

  21. PROMIS. http://www.csrd.uiuc.edu/promis/.

  22. C. J. Brownhill, A. Nicolau, S. Novack, and C. D. Polychronopoulos. Achieving Multi-level Parallelization. Proc. of ISHPC’97, Nov. 1997.

    Google Scholar 

  23. Parafrase2. http://www.csrd.uiuc.edu/parafrase2/.

  24. M. Girkar and C. Polychronopoulos. Optimization of Data/Control Conditions in Task Graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.

    Google Scholar 

  25. X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gozalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parllelism Exploitation in NUMA Multiprocessors. ICS’99 Rhodes Greece, 1999.

    Google Scholar 

  26. E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. ICPP’99, Sep. 1999.

    Google Scholar 

  27. OpenMP: Simple, Portable, Scalable SMP Programming http://www.openmp.org/.

  28. L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 1998.

    Google Scholar 

  29. H. K. et al. A Multi-grain Parallelizing Compilation Scheme on OSCAR. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.

    Google Scholar 

  30. M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara. A Hierarchical Macro-dataflow Computation Scheme of OSCAR Multi-grain Compiler. Trans. IPSJ, 35(4):513–521, Apr. 1994.

    Google Scholar 

  31. H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H. Honda. OSCAR Multi-grain Architecture and Its Evaluation. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Oct. 1997.

    Google Scholar 

  32. H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems. Proc. Int’l. Conf. on Parallel Processing, Aug. 1990.

    Google Scholar 

  33. H. Honda, M. Iwata, and H. Kasahara. Coarse Grain Parallelism Detection Scheme of Fortran programs. Trans. IEICE (in Japanese), J73-D-I(12), Dec. 1990.

    Google Scholar 

  34. H. Kasahara. Parallel Processing Technology. Corona Publishing, Tokyo (in Japanese), Jun. 1991.

    Google Scholar 

  35. H. Kasahara, H. Honda, and S. Narita. Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR. Proc. IEEE ACM Supercomputing’90, Nov. 1990.

    Google Scholar 

  36. J. E. Moreira and C. D. Polychronopoulos. Autoscheduling in a Shared Memory Multiprocessor. CSRD Report No.1337, 1994.

    Google Scholar 

  37. H. Kasahara, S. Narita, and S. Hashimoto. OSCAR’s Architecture. Trans. IEICE (in Japanese), J71-D-I(8), Aug. 1988.

    Google Scholar 

  38. IBM. XL Fortran for AIX Language Reference.

    Google Scholar 

  39. D. H. Kulkarni, S. Tandri, L. Martin, N. Copty, R. Silvera, X.-M. Tian, X. Xue, and J. Wang. XL Fortran Compiler for IBM SMP Systems. AIXpert Magazine, Dec. 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kasahara, H., Obata, M., Ishizaka, K. (2001). Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-45574-4_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42862-6

  • Online ISBN: 978-3-540-45574-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics