Skip to main content
Log in

High-level Language Support for User-defined Reductions

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The optimized handling of reductions on parallel supercomputers or clusters of workstations is critical to high performance because reductions are common in scientific codes and a potential source of bottlenecks. Yet in many high-level languages, a mechanism for writing efficient reductions remains surprisingly absent. Further, when such mechanisms do exist, they often do not provide the flexibility a programmer needs to achieve a desirable level of performance. In this paper, we present a new language construct for arbitrary reductions that lets a programmer achieve a level of performance equal to that achievable with the highly flexible, but low-level combination of Fortran and MPI. We have implemented this construct in the ZPL language and evaluate it in the context of the initialization of the NAS MG benchmark. We show a 45 times speedup over the same code written in ZPL without this construct. In addition, performance on a large number of processors surpasses that achieved in the NAS implementation showing that our mechanism provides programmers with the needed flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. NAS parallel benchmarks. Technical report, NASA Ames Research Center (RNR–94–007), March 1994.

  2. D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. NAS parallel benchmarks 2.0. Technical report. NASA Ames Research Center (NAS–95–020), December 1995.

  3. G. E. Blelloch. NESL: A nested data-parallel language (Version 3.1). Technical report. Carnegie Mellon (CMU-CS–95–170), September 1995.

  4. W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: Improving the effectiveness of parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, 1994.

  5. B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby. ZPL's WYSIWYG performance model. In Proceedings of the IEEE Workshop on High-Level Parallel Programming Models and Supportive Environments, 1998.

  6. B. L. Chamberlain, S. J. Deitz, and L. Snyder. A comparative study of the NAS MG benchmark across parallel languages and architectures. In Proceedings of the ACM Conference on Supercomputing, 2000.

  7. B. L. Chamberlain, E. C. Lewis, C. Lin, and L. Snyder. Regions: An abstraction for expressing array computation. In Proceedings of the ACM International Conference on Array Programming Languages, 1999.

  8. A. L. Fisher and A. M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1994.

  9. High Performance Fortran Forum. High Performance Fortran Language Specification, Version 2.0. 1997.

  10. O. Ibarra, M. C. Rinard, and P. C. Diniz. On the complexity of commutativity analysis. In Proceedings of the International Computing andCombinatorics Conference, 1996.

  11. R. E. Ladner and M. J. Fischer. Parallel prefix computation. In Proceedings of the IEEE International Conference on Parallel Processing, 1977.

  12. J. R. Larus, B. Richards, and G. Viswanathan. C**: A large-grain, object-oriented, data-parallel programming language. Technical report. University of Wisconsin-Madison (1126), November 1992.

  13. J. R. Larus, B. Richards, and G. Viswanathan. Parallel programming in C**: A large-grain data-parallel programming language. In G. V. Wilson and P. Lu, eds., Parallel Programming Using C++. MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  14. B. Lu and J. Mellor-Crummey. Compiler optimization of implicit reductionsfor distributed memory multiprocessors. In Proceedings of the International Parallel Processing Symposium, 1998.

  15. M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1996.

  16. S.-B. Scholz. On defining application-specific high-level array operations by means of shape-invariant programming facilities. In Proceedings of the ACM International Conference on Array Programming Languages, 1998.

  17. M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  18. L. Snyder. Programming Guide to ZPL. MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  19. T. Suganuma, H. Komatsu, and T. Nakatani. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the ACM International Conference on Supercomputing, 1996.

  20. G. Viswanathan and J. R. Larus. User-defined reductions for efficient communication in data-parallel languages. Technical report. University of Wisconsin-Madison (1293), January 1996.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deitz, S.J., Chamberlain, B.L. & Snyder, L. High-level Language Support for User-defined Reductions. The Journal of Supercomputing 23, 23–37 (2002). https://doi.org/10.1023/A:1015781018449

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015781018449

Navigation