Skip to main content

The performance impact of granularity control and functional parallelism

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1033))

Abstract

Task granularity and functional parallelism are fundamental issues in the optimization of parallel programs. Appropriate granularity for exploitation of parallelism is affected by characteristics of both the program and the execution environment. In this paper we demonstrate the efficacy of dynamic granularity control. The scheme we propose uses dynamic runtime information to select the task size of exploited parallelism at various stages of the execution of a program. We also demonstrate that functional parallelism can be an important factor in improving the performance of parallel programs, both in the presence and absence of loop-level parallelism. Functional parallelism can increase the amount of large-grain parallelism as well as provide finer-grain parallelism that leads to better load balance. Analytical models and benchmark results quantify the impact of granularity control and functional parallelism. The underlying implementation for this research is a low-overhead threads model based on user-level scheduling.

This work was supported by the Office of Naval Research under grant N00014-94-1-0234. Computational facilities were provided by the National Center for Supercomputing Applications. José Moreira was at the University of Illinois during the development of this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thomas Anderson, Edward Lazowska, and Henry Levy. The performance implications of thread management alternatives for shared-memory multiprocessors. IEEE Transactions on Computers, 38(12), December 1989.

    Google Scholar 

  2. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. In 13th ACM Symposium on Operating Systems Principles, pages 95–109. ACM Sigops, October 1991.

    Google Scholar 

  3. Carl J. Beckmann. Hardware and Software for Functional and Fine Grain Parallelism. PhD thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 1993.

    Google Scholar 

  4. Jyh-Herng Chow and Williams Ludwell Harrison. Switch-stacks: A scheme for microtasking nested parallel loops. In Supercomputing 90, pages 190–199, Nov. 1990.

    Google Scholar 

  5. Peter Dinda, Thomas Gross, David O'Hallaron, Edward Segall, James Stichnoth, Jaspal Subhlok, Jon Webb, and Bwolen Yang. The CMU task parallel program suite. Technical Report CMU-CS-94-131, School of Computer Science, Carnegie-Mellon University, March 1994.

    Google Scholar 

  6. Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1), February 1993.

    Google Scholar 

  7. Mike Galles and Eric Williams. Performance optimizations, implementation, and verification of the SGI Challenge multiprocessor. Silicon Graphics Technical Report. Available from http: //www. sgi. com.

    Google Scholar 

  8. M. Girkar and C. D. Polychronopoulos. The HTG: An intermediate representation for programs based on control and data dependences. Technical Report 1046, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, May 1991.

    Google Scholar 

  9. Milind Girkar. Functional Parallelism: Theoretical Foundations and Implementation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1992.

    Google Scholar 

  10. Milind Girkar and Constantine Polychronopoulos. Automatic detection and generation of unstructured parallelism in ordinary programs. IEEE Transactions on Parallel and Distributed Systems, 3(2), April 1992.

    Google Scholar 

  11. Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1989.

    Google Scholar 

  12. Anoop Gupta, Andrew Tucker, and Luis Stevens. Making effective use of shared-memory multiprocessors: The process control approach. Technical Report CSL-TR-91-475A, Computer Systems Laboratory, Stanford University, 1991.

    Google Scholar 

  13. S. F. Hummel and E. Schonberg. Low-overhead scheduling of nested parallelism. IBM J. Res. Develp., 35(5/6):743–765, Sept/Nov 1991.

    Google Scholar 

  14. D. E. Knuth. The Art of Computer Programming, Vol. 3 Sorting and Searching. Addison-Wesley, Reading, Mass., 1973.

    Google Scholar 

  15. S. L. Lyons, T. J. Hanratty, and J. B. MacLaughlin. Large-scale computer simulation of fully developed channel flow with heat transfer. International Journal of Numerical Methods for Fluids, 13:999–1028, 1991.

    Article  Google Scholar 

  16. Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. Firstclass user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 110–121, October 1991.

    Google Scholar 

  17. C. D. Polychronopoulos, M. B. Girkar, Mohammad R. Haghighat, C. L. Lee, B. Leung, and D. A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. International Journal of High Speed Computing, 1(1):45–72, May 1989.

    Article  Google Scholar 

  18. Constantine Polychronopoulos, Nawaf Bitar, and Steve Kleiman. nanothreads: A user-level threads architecture. In Proceedings of the ACM Symposium on Principles of Operating Systems, 1993.

    Google Scholar 

  19. Constantine D. Polychronopoulos. Autoscheduling: Control flow and data flow come together. Technical Report 1058, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1990.

    Google Scholar 

  20. Shankar Ramaswamy and Prithviraj Banerjee. Processor allocation and scheduling of macro dataflow graphs on distributed memory multicomputers by the PARADIGM compiler. In International Conference on Parallel Processing, pages 11:134–138, St. Charles, IL, August 1993.

    Google Scholar 

  21. Martin C. Rinard, Daniel J. Scales, and Monica S. Lam. Jade: A high-level machineindependent language for parallel programming. IEEE Computer, 26(6):28–38, June 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Chua-Huang Huang Ponnuswamy Sadayappan Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moreira, J.E., Schouten, D., Polychronopoulos, C. (1996). The performance impact of granularity control and functional parallelism. In: Huang, CH., Sadayappan, P., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1995. Lecture Notes in Computer Science, vol 1033. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0014225

Download citation

  • DOI: https://doi.org/10.1007/BFb0014225

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60765-6

  • Online ISBN: 978-3-540-49446-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics