An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications

Iyer, Ravi; Perdue, Jack; Rauchwerger, Lawrence; Amato, Nancy M.; Bhuyan, Laxmi

doi:10.1007/s10766-004-1187-0

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications

Published: August 2005

Volume 33, pages 307–350, (2005)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ravi Iyer¹,
Jack Perdue²,
Lawrence Rauchwerger²,
Nancy M. Amato² &
…
Laxmi Bhuyan³

48 Accesses
3 Citations
Explore all metrics

Abstract

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of Ll/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of Processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application’s execution time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

References

G. Abandah and E. Davidson, Characterizing Distributed Shared Memory Performance: Case Study of the Convex SPP1000, IEEE Transactions on Parallel and Distributed Systems, 9(2)(1998).
G. Abandah and E. Davidson, Effects of Architectural Trends and Technological Advances on the HP/Convex Exemplar’s Memory and Communication Performance, 25th Annual International Symposium on Computer Architecture, pp. 318–329 (1998).
N. M. Amato J. Perdue A. Pietracaprina G. Pucci M. Mathis (2000) Predicting Performance on SMPs. A Case Study: The SGI Power Challenge, International Parallel and Distributed Processing Symposium (IPDPS) Cancun, Mexico
Google Scholar
L. M. Censier P. Feautrier (1978) ArticleTitleA New Solution to Coherence Problems in Multicache Systems. IEEE Transactions on Computers C-27 IssueID12 1112–1118
Google Scholar
D. Culler R. Karp D. Patterson A. Sahay E. Santos et al. (1996) ArticleTitleLog P: a Practical Model for Computation Communications of the ACM 39 IssueID11 78–85 Occurrence Handle10.1145/240455.240477
Article Google Scholar
CXperf User’s Guide. Hewlett-Packard Corp. http://docs.hp.com/hpux/ onlinedocs/ B6323-96001/B6323-96001.html
HP-UX man page for chatr. HP.
HP-UX man page for pstat_getprocvm(). HP.
HP RISC Precision Architecture 2.0 (PA-RISC 2.0) Document, Hewlett-Packard Corporation, http://wwwhp.com/ahp/framed/technology/micropro/
HP 9000 V-Class Server Architecture Document, 2nd Edition, Hewlett-Packard Corporation, http://docs.hp.com:80/hpux/systems/#vclass.
C. Hristea and D. Lenoski, Measuring Memory Hierarchy Performance of Cache-Coherent Multiprocessors Using Micro Benchmarks, Proceeding of Supercomputing: High Performance Networking and Computing (1997).
R. Iyer, G. Janakiraman, R. Kumar, and L. Bhuyan, A Trace-Driven Analysis of Sharing Behavior in TPC-C, 2nd workshop on Computer Architecture Evaluation using Commercial Workloads. (1999).
R. Iyer, N. M. Amato, L. Rauchwerger, and L. Bhuyan, Comparing the Memory System Performance of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications, Proceedings of the 13th ACM International Conference on Supercomputing (ICS’ 99), pp. 339–347 (June, 1999).
IRIX man page for perfex. SGI.
IRIX man page for r10k_counters. SGI.
D. Jiang and J.P. Singh, Scaling Application Performance on a Cache-coherent Multiprocessors, Proceedings of the 26th International Symposium on Computer Architecture (ISCA), Atlanta, (May, 1999).
D. Jiang, and J. P. Singh, A Scaling Study of the SGI Origin2000: A Hardware Cachecoherent Multiprocessor, 9th SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, (1999).
J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proceedings of the 24th International Symposium on Computer Architecture, pp. 241–251, (May, 1996).
D. Lenoski et al. (1992) ArticleTitleThe Stanford DASH Multiprocessor IEEE Computer 25 IssueID3 63–79
Google Scholar
T. Lovett and R. Clapp, STiNG: A CC-NUMA Computer System for the Commercial Marketplace, 23rd Annual International Symposium on Computer Architecture, pp. 308–317 (1996).
C. Mather, K. Peterson, B. Raghunath, J. Reddy, I. Subramanian, B. Taylor, and E. Wong (all from Hewlett Packard), Performance Optimized Page Sizing in HP-UX 11.0, IWorks 1998 Presentation.
J. D. McCalpin, Memory Bandwidth and Machine Balance in Current High Performance Computers, IEEE Technical Committee on Computer Architecture newsletter (1995).
L. McVoy and C. Staelin, lmbench: Portable Tools for Performance Analysis, Proceedings of USENIX, San Diego (1996).
Parasol - HP V-Class Multiprocessor, Parasol Lab, Department of Computer Science, Texas A&M University, http://www.cs.tamu.edu/research/parasol.
R. Saavedra, R. Gaines, and M. Carlton, Characterizing the Performance Space of Shared Memory Computers Using Micro-Benchmarks, Technical Report # USC-CS-92-547, Department of Computer Science, University of Southern California (1993).
K. Shaw and G. Astfalk, Four State Cache Coherence Protocol in the Convex Exemplar System, http://www.hp.com/wsg/tech/technical.html.
Titan - SGI Origin 2000, Supercomputing Center, Texas A&M University, http:// anakin.tamu.edu/titan/
L. Valiant (1990) ArticleTitleA Bridging Model for Parallel Computation Communications of the ACM 33 IssueID8 103–111 Occurrence Handle10.1145/79173.79181
Article Google Scholar
K. Yeager (1996) ArticleTitleThe MIPS R10000 Superscalar Microprocessor IEEE Micro 16 IssueID2 28–40 Occurrence Handle10.1109/40.491460
Article Google Scholar
M. Zagha, B. Larson, S. Turner, and M. Itzkowits, Performance Analysis Using MIPS R10000 Performance Counters, Proceedings of Supercomputing’96, November 17–22, 1996 Pittsburgh, PA, ACM Press and IEEE Computer Society Press, 1996.

Download references

Author information

Authors and Affiliations

Intel Corporation, USA
Ravi Iyer
Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, 77843-3112, USA
Jack Perdue, Lawrence Rauchwerger & Nancy M. Amato
Department of Computer Science and Engineering, University of California Riverside, Riverside, CA, 92521, USA
Laxmi Bhuyan

Authors

Ravi Iyer
View author publications
You can also search for this author in PubMed Google Scholar
Jack Perdue
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Rauchwerger
View author publications
You can also search for this author in PubMed Google Scholar
Nancy M. Amato
View author publications
You can also search for this author in PubMed Google Scholar
Laxmi Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence Rauchwerger.

Additional information

preliminary version of this paper appeared in the 13th ACM International Conference on Supercomputing (ICS’99).⁽¹³⁾ This work was done while Iyer and Bhuyan were at Texas A&M. It was supported in part by a Hewlett-Packard Equipment Grant. Amato and Rauchwerger supported in part by NSF Grants ACI-9872126, EIA-9975018, EIA-0103742, EIA-9805823, ACR-0081510, ACR-0113971, CCR-0113974, EIA-9810937, EIA-0079874, by the DOE ASCI ASAP program, and by the Texas Higher Education Coordinating Board grant ATP-000512-0261-2001. Perdue supported in part by a Dept. of Education Graduate Fellowship (GAANN)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iyer, R., Perdue, J., Rauchwerger, L. et al. An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications. Int J Parallel Prog 33, 307–350 (2005). https://doi.org/10.1007/s10766-004-1187-0

Download citation

Received: 13 April 2004
Accepted: 29 October 2004
Issue Date: August 2005
DOI: https://doi.org/10.1007/s10766-004-1187-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation