Balanced, Locality-Based Parallel Irregular Reductions

Gutiérrez, Eladio; Plata, Oscar; Zapata, Emilio L.

doi:10.1007/3-540-35767-X_11

Eladio Gutiérrez⁵,
Oscar Plata⁵ &
Emilio L. Zapata⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2624))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

434 Accesses
1 Citations

Abstract

Much effort has been devoted recently to efficiently parallelize irregular reductions. Different parallelization techniques have been proposed during the last years that can be classified into two groups: LPO (Loop Partitioning Oriented methods) and DPO (Data Partitioning Oriented methods). We have analyzed both classes in terms of a set of performance aspects: data locality, memory overhead, parallelism and workload balancing. Load balancing is not an issue sufficiently analyzed in the literature in parallel reduction methods, specially those in the DPO class. In this paper we propose two techniques to introduce load balancing into a DPO method. The first technique is generic, as it can deal with any kind of load unbalancing present in the problem domain. The second technique handles a special case of load unbalancing, appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed solutions to load balancing for an example DPO method are presented. Experiments on static and dynamic kernel codes were conducted making comparisons with other parallel reduction methods.

This work was supported by Ministry of Education and Culture (CICYT), Spain, through grant TIC2000-1658

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Asenjo, E. Gutiérrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the Automatic Parallelization of Sparse and Irregular Fortran Codes. Technical Report 1512, University for Illinois at Urbana-Champaign, Center for Supercomputing R&D., December 1996.
Google Scholar
T. Davis, The University of Florida Sparse Matrix Collection. NA Digest, 97(23), June 1997.
Google Scholar
C. Ding and K. Kennedy, Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations. In Proceedings of the ACM International Conference on Programming Language Design and Implementation (PLDI’99), pages 229–241, Atlanta, GA, May 1999.
Google Scholar
E. Gutiérrez, O. Plata, and E.L. Zapata. An Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Multiprocessors. In Proceedings of the 5th International Euro-Par Conference (EuroPar’99), pages 422–429, Tolouse, France, August–September 1999.
Google Scholar
E. Gutiérrez, O. Plata, and E.L. Zapata. A Compiler Method for the Parallel Execution of Irregular Reductions in Scalable Shared Memory Multiprocessors. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 78–87, Santa Fe, NM, May 2000.
Google Scholar
E. Gutiérrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic Parallelization of Irregular Applications. J. Parallel Computing, 26(13–14):1709–1738, December 2000.
Article MATH Google Scholar
H. Han and C.-W. Tseng, Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes. In Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing (LCPC’98), pages 181–196, Chapel Hill, NC, August 1998.
Google Scholar
H. Han and C.-W. Tseng, Efficient Compiler and Run-Time Support for Parallel Irregular Reductions. J. Parallel Computing, 26(13–14):1709–1738, December 2000.
Google Scholar
H. Han and C.-W. Tseng, Improving Locality for Adaptive Irregular Scientific Codes. In Proceedings of the 13th Workshop on Languages and Compilers for Parallel Computing (LCPC’00), Yorktown Heights, NY, August 2000.
Google Scholar
H. Han and C.-W. Tseng, A Comparison of Parallelization Techniques for Irregular Reductions. In Proceedings of the 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS’2001), San Francisco, CA, April 2001.
Google Scholar
Y. Lin and D. Padua, On the Automatic Parallelization of Sparse and Irregular Fortran Programs. In Proceedings of the 4th Workshop on Languages, Compilers and Runtime Systems for Scalable Computers (LCR’98), Pittsburgh, PA, May 1998.
Google Scholar
J. Morales and S. Toxvaerd. The Cell-Neighbour Table Method in Molecular Dynamics Simulations. Computer Physics Communication, 71:71–76, 1992.
Article Google Scholar
N. Mukherjee and J.R. Gurd, A Comparative Analysis of Four Parallelisation Schemes. In Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99), pages 278–285, Rhodes, Greece, June 1999.
Google Scholar
OpenMP Architecture Review Board. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. http://www.openmp.org, 1997.
R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Data Distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8):815–831, June 1995.
Article Google Scholar
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 218–232, La Jolla, CA, June 1995.
Google Scholar
S. Toxvaerd. Algorithms for Canonical Molecular Dynamics Simulations. Molecular Physics, 72(1).159–168, 1991.
Article Google Scholar
H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization Techniques. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 66–77, Santa Fe, NM, May 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Architecture, University of Málaga, E-29071, Málaga, Spain
Eladio Gutiérrez, Oscar Plata & Emilio L. Zapata

Authors

Eladio Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Plata
View author publications
You can also search for this author in PubMed Google Scholar
Emilio L. Zapata
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical and Computer Engineering Department, University of Kentucky, Lexington, KY, 40506-0046, USA
Henry G. Dietz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gutiérrez, E., Plata, O., Zapata, E.L. (2003). Balanced, Locality-Based Parallel Irregular Reductions. In: Dietz, H.G. (eds) Languages and Compilers for Parallel Computing. LCPC 2001. Lecture Notes in Computer Science, vol 2624. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35767-X_11

Download citation

DOI: https://doi.org/10.1007/3-540-35767-X_11
Published: 13 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04029-3
Online ISBN: 978-3-540-35767-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics