Abstract
Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buffers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce Local Write, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates the need for buffers or synchronized writes, but may replicate computation. We investigate the impact of connectivity (node/edge ratio), locality (accesses to local data) and adaptivity (edge modifications) on their relative performance. Local Write improves performance by 50–150% compared to using replicated buffers, and can match or exceed gather/scatter for applications with low locality or high adaptivity.
This research was supported by NSF CAREER Development Award #ASC9625531 in New Technologies. The IBM SP-2 and DEC Alpha Cluster were provided by NSF CISE Institutional Infrastructure Award #CDA9401151 and grants from IBM and DEC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Autrey and M. Wolfe. Initial results for glacial variable analysis. In D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Ninth International Workshop (LCPC’96), volume 1239 of Lecture Notes in Computer Science. Springer-Verlag, Santa Clara, CA, 1996.
S. Chandra and J. R. Larus. Optimizing communication in HPF programs for finegrain distributed shared memory. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, NV, June 1997.
A. Cox, S. Dwarkadas, H. Lu, and W. Zwaenepoel. Evaluating the performance of software distributed shared memory as a target for parallelizing compilers. In Proceedings of the 11th International Parallel Processing Symposium, Geneva, Switzerlan, April 1997.
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462–479, September 1994.
S. Dwarkadas, A. Cox, and W. Zwaenepoel. An integrated compile-time-runtime software distributed shared memory system. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), Boston, MA, October 1996.
M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam. Detecting coarsegrain parallelism using an interprocedural parallelizing compiler. In Proceedings of Supercomputing’ 95, San Diego, CA, December 1995.
H. Han and C.-W. Tseng. Compile-time synchronization optimizations for software DSMs. In Proceedings of the 12th International Parallel Processing Symposium, Orlando, FL, April 1998.
R. v. Hanxleden. Handling irregular problems with Fortran D — A preliminary report. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993. 187, 188, 193, 194
R. v. Hanxleden and K. Kennedy. Give-N-Take — A balanced code placement framework. In Proceedings of the SIGPLAN’ 94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66–80, August 1992.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Preliminary experiences with the Fortran D compiler. In Proceedings of Supercomputing’ 93, Portland, OR, November 1993.
Y.-S. Hwang, B. Moon, S. Sharma, R. Ponnusamy, R. Das, and J. Saltz. Runtime and language support for compiling adaptive irregular programs on distributed memory machines. Software—Practice and Experience, 25(6):597–621, June 1995.
P. Keleher. Update protocols and iterative scientific applications. In Proceedings of the 12th International Parallel Processing Symposium, Orlando, FL, April 1998.
P. Keleher and C.-W. Tseng. Enhancing software DSM for compiler-parallelized applications. In Proceedings of the 11th International Parallel Processing Symposium, Geneva, Switzerland, April 1997.
A. Lain and P. Banerjee. Exploiting spatial regularity in irregular iterative applications. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, CA, April 1995.
B. Lu and J. Mellor-Crummey. Compiler optimization of implicit reductions for distributed memory multiprocessors. In Proceedings of the 12th International Parallel Processing Symposium, Orlando, FL, April 1998.
H. Lu, A. Cox, S. Dwarkadas, R. Rajamony, and W. Zwaenepoel. Compiler and software distributed shared memory support for irregular applications. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, NV, June 1997.
S. Mukherjee, S. Sharma, M. Hill, J. Larus, A. Rogers, and J. Saltz. Efficient support for irregular applications on distributed-memory machines. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, H., Tseng, CW. (1999). Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes. In: Chatterjee, S., et al. Languages and Compilers for Parallel Computing. LCPC 1998. Lecture Notes in Computer Science, vol 1656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48319-5_12
Download citation
DOI: https://doi.org/10.1007/3-540-48319-5_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66426-0
Online ISBN: 978-3-540-48319-9
eBook Packages: Springer Book Archive