Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor

Ishizaka, Kazuhisa; Obata, Motoki; Kasahara, Hironori

doi:10.1007/3-540-35767-X_23

Kazuhisa Ishizaka^5,6,
Motoki Obata^5,6 &
Hironori Kasahara^5,6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2624))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

422 Accesses
5 Citations

Abstract

In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory hierarchy is important for the performance improvement of multiprocessor systems because the speed gap between processors and memories is getting larger.

This paper describes coarse grain task parallel processing that uses parallelism among macro-tasks like loops and subroutines considering cache optimization using data localization scheme. The proposed scheme is implemented on OSCAR automatic multigrain parallelizing compiler. OSCAR compiler generates OpenMP FORTRAN program realizing the proposed scheme from a sequential FORTRAN77 program. Its performance is evaluated on IBM RS6000 SP 604e High Node 8 processors SMP machine using SPEC95fp tomcatv, swim, mgrid. In the evaluation, the proposed coarse grain task parallel processing scheme with cache optimization gives us up to 1.3 times speedup on 1PE, 4.7 times speedup on 4PE and 8.8 times speedup on 8PE compared with a sequential processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Eigenmann, J. Hoeflinger, and D. Padua. On the automatic parallelization of the perfect benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.
Google Scholar
P. Tu and D. Padua. Automatic array privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
Google Scholar
L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-time methods for parallelizing partially parallel loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137–146, Jul. 1995.
Google Scholar
M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S. Liao, E. Bugnion, and M. S. Lam. Maximizing multiprocessor performance with the suif compiler. IEEE Computer, 1996.
Google Scholar
M. W. Hall, B. R. Murphy, S. P. Amarasinghe, S. Liao, and M. S. Lam. Interprocedural parallelization analysis: A case study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing, Aug. 1995.
Google Scholar
A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitoning algorithm to maximize parallelism and minimize communication. Proc. of the 13th ACM SIGARCH International Conference on Supercomputing, Jun. 1999.
Google Scholar
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and computation transformations for multiprocessors. Proc. of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.
Google Scholar
H. Han, G. Rivera, and C-W. Tseng. Software support for improving locality in scientific codes. 8th Workshop on Compilers for Parallel Computers, Jan. 2000.
Google Scholar
G. Rivera and C-W. Tseng. Locality optimizations for multi-level caches. Super Computing’ 99, Nov. 1999.
Google Scholar
A. Yoshida, Y. Ujigawa, M. Obata, K. Kimura, and H. Kasahara. Data-localization among doall and sequential loops in coarse grain parallel processing. Seventh Workshop on Compilers for Parallel Computers, Jul. 1998.
Google Scholar
Advanced Parallelizing Compiler Project. http://www.apc.waseda.ac.jp/.
C. J. Brownhill, A. Nicolau, S Novack, and C. D. Polychronopoulos. Achieving multi-level parallelization. Proc. of the International Symposium on High Performance Computing, 1997.
Google Scholar
X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gonzalez, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitatio in numa multiprocessors. Proc. of the 1999 nternational Conference on Supercomputing, June 1999.
Google Scholar
Portable Scalable SMP Programing OpenMP: Simple. http://www.openmp.org/.
L. Dagum and R. Menon. Openmp: An industry standard api for shared memory programming. IEEE Computational Science & Engineering, 1998.
Google Scholar
H. Kasahara, M. Obata, and K. Ishizaka. Automatic coarse grain task parallel processing on smp using openmp. Proc. of 13 th International Workshop on Languages and Compilers for Parallel Computing 2000, Aug. 2000.
Google Scholar
H. Kasahara A. Yhoshida, K. Koshizuka. Data-localization using loop aligned decomposition for macro-dataflow processing. Proc. of 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.
Google Scholar
K. Kimura and H. Kasahara. Near fine grain parallel processing using static scheduling on single chip multiprocessors. Proc. of International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Nov. 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept.EECE, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Kazuhisa Ishizaka, Motoki Obata & Hironori Kasahara
Advanced Parallelizing Compiler Project, Japan
Kazuhisa Ishizaka, Motoki Obata & Hironori Kasahara

Authors

Kazuhisa Ishizaka
View author publications
You can also search for this author in PubMed Google Scholar
Motoki Obata
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical and Computer Engineering Department, University of Kentucky, Lexington, KY, 40506-0046, USA
Henry G. Dietz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishizaka, K., Obata, M., Kasahara, H. (2003). Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor. In: Dietz, H.G. (eds) Languages and Compilers for Parallel Computing. LCPC 2001. Lecture Notes in Computer Science, vol 2624. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35767-X_23

Download citation

DOI: https://doi.org/10.1007/3-540-35767-X_23
Published: 13 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04029-3
Online ISBN: 978-3-540-35767-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics