The performance impact of granularity control and functional parallelism

Moreira, José E.; Schouten, Dale; Polychronopoulos, Constantine

doi:10.1007/BFb0014225

José E. Moreira¹,
Dale Schouten² &
Constantine Polychronopoulos²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1033))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

148 Accesses
1 Citations

Abstract

Task granularity and functional parallelism are fundamental issues in the optimization of parallel programs. Appropriate granularity for exploitation of parallelism is affected by characteristics of both the program and the execution environment. In this paper we demonstrate the efficacy of dynamic granularity control. The scheme we propose uses dynamic runtime information to select the task size of exploited parallelism at various stages of the execution of a program. We also demonstrate that functional parallelism can be an important factor in improving the performance of parallel programs, both in the presence and absence of loop-level parallelism. Functional parallelism can increase the amount of large-grain parallelism as well as provide finer-grain parallelism that leads to better load balance. Analytical models and benchmark results quantify the impact of granularity control and functional parallelism. The underlying implementation for this research is a low-overhead threads model based on user-level scheduling.

This work was supported by the Office of Naval Research under grant N00014-94-1-0234. Computational facilities were provided by the National Center for Supercomputing Applications. José Moreira was at the University of Illinois during the development of this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Thomas Anderson, Edward Lazowska, and Henry Levy. The performance implications of thread management alternatives for shared-memory multiprocessors. IEEE Transactions on Computers, 38(12), December 1989.
Google Scholar
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. In 13th ACM Symposium on Operating Systems Principles, pages 95–109. ACM Sigops, October 1991.
Google Scholar
Carl J. Beckmann. Hardware and Software for Functional and Fine Grain Parallelism. PhD thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 1993.
Google Scholar
Jyh-Herng Chow and Williams Ludwell Harrison. Switch-stacks: A scheme for microtasking nested parallel loops. In Supercomputing 90, pages 190–199, Nov. 1990.
Google Scholar
Peter Dinda, Thomas Gross, David O'Hallaron, Edward Segall, James Stichnoth, Jaspal Subhlok, Jon Webb, and Bwolen Yang. The CMU task parallel program suite. Technical Report CMU-CS-94-131, School of Computer Science, Carnegie-Mellon University, March 1994.
Google Scholar
Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1), February 1993.
Google Scholar
Mike Galles and Eric Williams. Performance optimizations, implementation, and verification of the SGI Challenge multiprocessor. Silicon Graphics Technical Report. Available from http: //www. sgi. com.
Google Scholar
M. Girkar and C. D. Polychronopoulos. The HTG: An intermediate representation for programs based on control and data dependences. Technical Report 1046, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, May 1991.
Google Scholar
Milind Girkar. Functional Parallelism: Theoretical Foundations and Implementation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1992.
Google Scholar
Milind Girkar and Constantine Polychronopoulos. Automatic detection and generation of unstructured parallelism in ordinary programs. IEEE Transactions on Parallel and Distributed Systems, 3(2), April 1992.
Google Scholar
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1989.
Google Scholar
Anoop Gupta, Andrew Tucker, and Luis Stevens. Making effective use of shared-memory multiprocessors: The process control approach. Technical Report CSL-TR-91-475A, Computer Systems Laboratory, Stanford University, 1991.
Google Scholar
S. F. Hummel and E. Schonberg. Low-overhead scheduling of nested parallelism. IBM J. Res. Develp., 35(5/6):743–765, Sept/Nov 1991.
Google Scholar
D. E. Knuth. The Art of Computer Programming, Vol. 3 Sorting and Searching. Addison-Wesley, Reading, Mass., 1973.
Google Scholar
S. L. Lyons, T. J. Hanratty, and J. B. MacLaughlin. Large-scale computer simulation of fully developed channel flow with heat transfer. International Journal of Numerical Methods for Fluids, 13:999–1028, 1991.
Article Google Scholar
Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. Firstclass user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 110–121, October 1991.
Google Scholar
C. D. Polychronopoulos, M. B. Girkar, Mohammad R. Haghighat, C. L. Lee, B. Leung, and D. A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. International Journal of High Speed Computing, 1(1):45–72, May 1989.
Article Google Scholar
Constantine Polychronopoulos, Nawaf Bitar, and Steve Kleiman. nanothreads: A user-level threads architecture. In Proceedings of the ACM Symposium on Principles of Operating Systems, 1993.
Google Scholar
Constantine D. Polychronopoulos. Autoscheduling: Control flow and data flow come together. Technical Report 1058, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1990.
Google Scholar
Shankar Ramaswamy and Prithviraj Banerjee. Processor allocation and scheduling of macro dataflow graphs on distributed memory multicomputers by the PARADIGM compiler. In International Conference on Parallel Processing, pages 11:134–138, St. Charles, IL, August 1993.
Google Scholar
Martin C. Rinard, Daniel J. Scales, and Monica S. Lam. Jade: A high-level machineindependent language for parallel programming. IEEE Computer, 26(6):28–38, June 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, 10598-0218, Yorktown Heights, NY
José E. Moreira
Center for Supercomputing Research and Development, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 W. Main St., 61801-2307, Urbana, IL
Dale Schouten & Constantine Polychronopoulos

Authors

José E. Moreira
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schouten
View author publications
You can also search for this author in PubMed Google Scholar
Constantine Polychronopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Chua-Huang Huang Ponnuswamy Sadayappan Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moreira, J.E., Schouten, D., Polychronopoulos, C. (1996). The performance impact of granularity control and functional parallelism. In: Huang, CH., Sadayappan, P., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1995. Lecture Notes in Computer Science, vol 1033. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0014225

Download citation

DOI: https://doi.org/10.1007/BFb0014225
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60765-6
Online ISBN: 978-3-540-49446-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics