Abstract
Much effort has been devoted recently to efficiently parallelize irregular reductions. Different parallelization techniques have been proposed during the last years that can be classified into two groups: LPO (Loop Partitioning Oriented methods) and DPO (Data Partitioning Oriented methods). We have analyzed both classes in terms of a set of performance aspects: data locality, memory overhead, parallelism and workload balancing. Load balancing is not an issue sufficiently analyzed in the literature in parallel reduction methods, specially those in the DPO class. In this paper we propose two techniques to introduce load balancing into a DPO method. The first technique is generic, as it can deal with any kind of load unbalancing present in the problem domain. The second technique handles a special case of load unbalancing, appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed solutions to load balancing for an example DPO method are presented. Experiments on static and dynamic kernel codes were conducted making comparisons with other parallel reduction methods.
This work was supported by Ministry of Education and Culture (CICYT), Spain, through grant TIC2000-1658
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Asenjo, E. Gutiérrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the Automatic Parallelization of Sparse and Irregular Fortran Codes. Technical Report 1512, University for Illinois at Urbana-Champaign, Center for Supercomputing R&D., December 1996.
T. Davis, The University of Florida Sparse Matrix Collection. NA Digest, 97(23), June 1997.
C. Ding and K. Kennedy, Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations. In Proceedings of the ACM International Conference on Programming Language Design and Implementation (PLDI’99), pages 229–241, Atlanta, GA, May 1999.
E. Gutiérrez, O. Plata, and E.L. Zapata. An Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Multiprocessors. In Proceedings of the 5th International Euro-Par Conference (EuroPar’99), pages 422–429, Tolouse, France, August–September 1999.
E. Gutiérrez, O. Plata, and E.L. Zapata. A Compiler Method for the Parallel Execution of Irregular Reductions in Scalable Shared Memory Multiprocessors. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 78–87, Santa Fe, NM, May 2000.
E. Gutiérrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic Parallelization of Irregular Applications. J. Parallel Computing, 26(13–14):1709–1738, December 2000.
H. Han and C.-W. Tseng, Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes. In Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing (LCPC’98), pages 181–196, Chapel Hill, NC, August 1998.
H. Han and C.-W. Tseng, Efficient Compiler and Run-Time Support for Parallel Irregular Reductions. J. Parallel Computing, 26(13–14):1709–1738, December 2000.
H. Han and C.-W. Tseng, Improving Locality for Adaptive Irregular Scientific Codes. In Proceedings of the 13th Workshop on Languages and Compilers for Parallel Computing (LCPC’00), Yorktown Heights, NY, August 2000.
H. Han and C.-W. Tseng, A Comparison of Parallelization Techniques for Irregular Reductions. In Proceedings of the 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS’2001), San Francisco, CA, April 2001.
Y. Lin and D. Padua, On the Automatic Parallelization of Sparse and Irregular Fortran Programs. In Proceedings of the 4th Workshop on Languages, Compilers and Runtime Systems for Scalable Computers (LCR’98), Pittsburgh, PA, May 1998.
J. Morales and S. Toxvaerd. The Cell-Neighbour Table Method in Molecular Dynamics Simulations. Computer Physics Communication, 71:71–76, 1992.
N. Mukherjee and J.R. Gurd, A Comparative Analysis of Four Parallelisation Schemes. In Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99), pages 278–285, Rhodes, Greece, June 1999.
OpenMP Architecture Review Board. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. http://www.openmp.org, 1997.
R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Data Distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8):815–831, June 1995.
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 218–232, La Jolla, CA, June 1995.
S. Toxvaerd. Algorithms for Canonical Molecular Dynamics Simulations. Molecular Physics, 72(1).159–168, 1991.
H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization Techniques. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 66–77, Santa Fe, NM, May 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gutiérrez, E., Plata, O., Zapata, E.L. (2003). Balanced, Locality-Based Parallel Irregular Reductions. In: Dietz, H.G. (eds) Languages and Compilers for Parallel Computing. LCPC 2001. Lecture Notes in Computer Science, vol 2624. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35767-X_11
Download citation
DOI: https://doi.org/10.1007/3-540-35767-X_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04029-3
Online ISBN: 978-3-540-35767-4
eBook Packages: Springer Book Archive