Skip to main content

Balanced, Locality-Based Parallel Irregular Reductions

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2624))

Abstract

Much effort has been devoted recently to efficiently parallelize irregular reductions. Different parallelization techniques have been proposed during the last years that can be classified into two groups: LPO (Loop Partitioning Oriented methods) and DPO (Data Partitioning Oriented methods). We have analyzed both classes in terms of a set of performance aspects: data locality, memory overhead, parallelism and workload balancing. Load balancing is not an issue sufficiently analyzed in the literature in parallel reduction methods, specially those in the DPO class. In this paper we propose two techniques to introduce load balancing into a DPO method. The first technique is generic, as it can deal with any kind of load unbalancing present in the problem domain. The second technique handles a special case of load unbalancing, appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed solutions to load balancing for an example DPO method are presented. Experiments on static and dynamic kernel codes were conducted making comparisons with other parallel reduction methods.

This work was supported by Ministry of Education and Culture (CICYT), Spain, through grant TIC2000-1658

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Asenjo, E. Gutiérrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the Automatic Parallelization of Sparse and Irregular Fortran Codes. Technical Report 1512, University for Illinois at Urbana-Champaign, Center for Supercomputing R&D., December 1996.

    Google Scholar 

  2. T. Davis, The University of Florida Sparse Matrix Collection. NA Digest, 97(23), June 1997.

    Google Scholar 

  3. C. Ding and K. Kennedy, Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations. In Proceedings of the ACM International Conference on Programming Language Design and Implementation (PLDI’99), pages 229–241, Atlanta, GA, May 1999.

    Google Scholar 

  4. E. Gutiérrez, O. Plata, and E.L. Zapata. An Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Multiprocessors. In Proceedings of the 5th International Euro-Par Conference (EuroPar’99), pages 422–429, Tolouse, France, August–September 1999.

    Google Scholar 

  5. E. Gutiérrez, O. Plata, and E.L. Zapata. A Compiler Method for the Parallel Execution of Irregular Reductions in Scalable Shared Memory Multiprocessors. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 78–87, Santa Fe, NM, May 2000.

    Google Scholar 

  6. E. Gutiérrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic Parallelization of Irregular Applications. J. Parallel Computing, 26(13–14):1709–1738, December 2000.

    Article  MATH  Google Scholar 

  7. H. Han and C.-W. Tseng, Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes. In Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing (LCPC’98), pages 181–196, Chapel Hill, NC, August 1998.

    Google Scholar 

  8. H. Han and C.-W. Tseng, Efficient Compiler and Run-Time Support for Parallel Irregular Reductions. J. Parallel Computing, 26(13–14):1709–1738, December 2000.

    Google Scholar 

  9. H. Han and C.-W. Tseng, Improving Locality for Adaptive Irregular Scientific Codes. In Proceedings of the 13th Workshop on Languages and Compilers for Parallel Computing (LCPC’00), Yorktown Heights, NY, August 2000.

    Google Scholar 

  10. H. Han and C.-W. Tseng, A Comparison of Parallelization Techniques for Irregular Reductions. In Proceedings of the 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS’2001), San Francisco, CA, April 2001.

    Google Scholar 

  11. Y. Lin and D. Padua, On the Automatic Parallelization of Sparse and Irregular Fortran Programs. In Proceedings of the 4th Workshop on Languages, Compilers and Runtime Systems for Scalable Computers (LCR’98), Pittsburgh, PA, May 1998.

    Google Scholar 

  12. J. Morales and S. Toxvaerd. The Cell-Neighbour Table Method in Molecular Dynamics Simulations. Computer Physics Communication, 71:71–76, 1992.

    Article  Google Scholar 

  13. N. Mukherjee and J.R. Gurd, A Comparative Analysis of Four Parallelisation Schemes. In Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99), pages 278–285, Rhodes, Greece, June 1999.

    Google Scholar 

  14. OpenMP Architecture Review Board. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. http://www.openmp.org, 1997.

  15. R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Data Distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8):815–831, June 1995.

    Article  Google Scholar 

  16. L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 218–232, La Jolla, CA, June 1995.

    Google Scholar 

  17. S. Toxvaerd. Algorithms for Canonical Molecular Dynamics Simulations. Molecular Physics, 72(1).159–168, 1991.

    Article  Google Scholar 

  18. H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization Techniques. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 66–77, Santa Fe, NM, May 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gutiérrez, E., Plata, O., Zapata, E.L. (2003). Balanced, Locality-Based Parallel Irregular Reductions. In: Dietz, H.G. (eds) Languages and Compilers for Parallel Computing. LCPC 2001. Lecture Notes in Computer Science, vol 2624. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35767-X_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-35767-X_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-04029-3

  • Online ISBN: 978-3-540-35767-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics