Parallelizing for a good node performance

Hernández, Emilio

doi:10.1007/3-540-61142-8_593

Emilio Hernández¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1067))

Included in the following conference series:

International Conference on High-Performance Computing and Networking

146 Accesses

Abstract

Current practice in parallelization strategies for MIMD machines adopted by both programmers and parallelizing compilers is based on partitioning the computation such that communication is minimized. The optimization for enhancing single node performance is then performed as a second step. This two-step procedure may not deliver optimal parallel performance. Good performance on tightly coupled parallel machines relies more and more on a good utilization of the single node resources, like cache memories and vector units. In this paper we present evidence related to the importance of the efficient utilization of single node resources when deciding how to parallelize a program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes

Article 01 September 2021

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Parallel Programming Models

References

C. Stunkel, D. Shea, B. Abali, Atkins M., C. Bender, D. Grice, P. Hochschild, D. Joseph, B. Nathanson, R. Swetz, R. Stucke, M. Tsao, and P. Varker. The SP2 High-Performance Switch. IBM Systems Journal, 34(2):185–204, 1995.
Google Scholar
MEIKO. Computing Surface CS-2, Enterprise Server. Technical documentation supplied by Meiko, 1993.
Google Scholar
J. Hennessy and D. Patterson. Computer Architecture, a Quantitative Approach. Morgan Kaufman Publishers, inc., 1990.
Google Scholar
M. Lam, E. Rothberg, and M. Wolf. The Cache Performance and Optimizations of Blocked Algorithms. In 4th International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.
Google Scholar
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, and M. Wu. Compiling FORTRAN 90D/HPF for Distributed-Memory MIMD Computers. Journal of Parallel and Distributed Computing, 21(1):15–26, 1994.
Google Scholar
A. Wakatani and M. Wolfe. Optimization of Array Redistribution for Distributed-Memory Multicomputers. Parallel Computing, 21(9):1485–1490, 1995.
Google Scholar
P. Banerjee, J. Chandy, M. Gupta, E. Hodges, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The Paradigm Compiler for Distributed-Memory Multicomputers. Computer, 28(10):37–47, October 1995.
Google Scholar
M. Quinn. Parallel Computing, Theory and Practice (Chapter 7). McGraw-Hill, 2nd edition, 1994.
Google Scholar
M. Gupta and P. Banerjee. Compile-Time Estimation of Communication Costs of Programs. Journal of Programming Languages, 2(3):191–225, 1994.
Google Scholar
Inc. Portland Group. The Portland Group PGCC, User's Guide. Technical documentation supplied by Portland Group, 1993.
Google Scholar
N. Aburto. Matrix multiplication benchmarks and results: mm.shar, mm_1.tbl, mm_2.tbl, mm_3.tbl. Accessible using ftp from ftp.nosc.mil:/pub/aburto, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Computer Science, University of Southampton, SO17 1BJ, Southampton, UK
Emilio Hernández

Authors

Emilio Hernández
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Heather Liddell Adrian Colbrook Bob Hertzberger Peter Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernández, E. (1996). Parallelizing for a good node performance. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_593

Download citation

DOI: https://doi.org/10.1007/3-540-61142-8_593
Published: 18 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61142-4
Online ISBN: 978-3-540-49955-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Parallelizing for a good node performance

Abstract

Access this chapter

Preview

Similar content being viewed by others

Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Parallel Programming Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallelizing for a good node performance

Abstract

Access this chapter

Preview

Similar content being viewed by others

Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Parallel Programming Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation