Abstract
As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of Ll/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of Processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application’s execution time performance.
Similar content being viewed by others
References
G. Abandah and E. Davidson, Characterizing Distributed Shared Memory Performance: Case Study of the Convex SPP1000, IEEE Transactions on Parallel and Distributed Systems, 9(2)(1998).
G. Abandah and E. Davidson, Effects of Architectural Trends and Technological Advances on the HP/Convex Exemplar’s Memory and Communication Performance, 25th Annual International Symposium on Computer Architecture, pp. 318–329 (1998).
N. M. Amato J. Perdue A. Pietracaprina G. Pucci M. Mathis (2000) Predicting Performance on SMPs. A Case Study: The SGI Power Challenge, International Parallel and Distributed Processing Symposium (IPDPS) Cancun, Mexico
L. M. Censier P. Feautrier (1978) ArticleTitleA New Solution to Coherence Problems in Multicache Systems. IEEE Transactions on Computers C-27 IssueID12 1112–1118
D. Culler R. Karp D. Patterson A. Sahay E. Santos et al. (1996) ArticleTitleLog P: a Practical Model for Computation Communications of the ACM 39 IssueID11 78–85 Occurrence Handle10.1145/240455.240477
CXperf User’s Guide. Hewlett-Packard Corp. http://docs.hp.com/hpux/ onlinedocs/ B6323-96001/B6323-96001.html
HP-UX man page for chatr. HP.
HP-UX man page for pstat_getprocvm(). HP.
HP RISC Precision Architecture 2.0 (PA-RISC 2.0) Document, Hewlett-Packard Corporation, http://wwwhp.com/ahp/framed/technology/micropro/
HP 9000 V-Class Server Architecture Document, 2nd Edition, Hewlett-Packard Corporation, http://docs.hp.com:80/hpux/systems/#vclass.
C. Hristea and D. Lenoski, Measuring Memory Hierarchy Performance of Cache-Coherent Multiprocessors Using Micro Benchmarks, Proceeding of Supercomputing: High Performance Networking and Computing (1997).
R. Iyer, G. Janakiraman, R. Kumar, and L. Bhuyan, A Trace-Driven Analysis of Sharing Behavior in TPC-C, 2nd workshop on Computer Architecture Evaluation using Commercial Workloads. (1999).
R. Iyer, N. M. Amato, L. Rauchwerger, and L. Bhuyan, Comparing the Memory System Performance of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications, Proceedings of the 13th ACM International Conference on Supercomputing (ICS’ 99), pp. 339–347 (June, 1999).
IRIX man page for perfex. SGI.
IRIX man page for r10k_counters. SGI.
D. Jiang and J.P. Singh, Scaling Application Performance on a Cache-coherent Multiprocessors, Proceedings of the 26th International Symposium on Computer Architecture (ISCA), Atlanta, (May, 1999).
D. Jiang, and J. P. Singh, A Scaling Study of the SGI Origin2000: A Hardware Cachecoherent Multiprocessor, 9th SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, (1999).
J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proceedings of the 24th International Symposium on Computer Architecture, pp. 241–251, (May, 1996).
D. Lenoski et al. (1992) ArticleTitleThe Stanford DASH Multiprocessor IEEE Computer 25 IssueID3 63–79
T. Lovett and R. Clapp, STiNG: A CC-NUMA Computer System for the Commercial Marketplace, 23rd Annual International Symposium on Computer Architecture, pp. 308–317 (1996).
C. Mather, K. Peterson, B. Raghunath, J. Reddy, I. Subramanian, B. Taylor, and E. Wong (all from Hewlett Packard), Performance Optimized Page Sizing in HP-UX 11.0, IWorks 1998 Presentation.
J. D. McCalpin, Memory Bandwidth and Machine Balance in Current High Performance Computers, IEEE Technical Committee on Computer Architecture newsletter (1995).
L. McVoy and C. Staelin, lmbench: Portable Tools for Performance Analysis, Proceedings of USENIX, San Diego (1996).
Parasol - HP V-Class Multiprocessor, Parasol Lab, Department of Computer Science, Texas A&M University, http://www.cs.tamu.edu/research/parasol.
R. Saavedra, R. Gaines, and M. Carlton, Characterizing the Performance Space of Shared Memory Computers Using Micro-Benchmarks, Technical Report # USC-CS-92-547, Department of Computer Science, University of Southern California (1993).
K. Shaw and G. Astfalk, Four State Cache Coherence Protocol in the Convex Exemplar System, http://www.hp.com/wsg/tech/technical.html.
Titan - SGI Origin 2000, Supercomputing Center, Texas A&M University, http:// anakin.tamu.edu/titan/
L. Valiant (1990) ArticleTitleA Bridging Model for Parallel Computation Communications of the ACM 33 IssueID8 103–111 Occurrence Handle10.1145/79173.79181
K. Yeager (1996) ArticleTitleThe MIPS R10000 Superscalar Microprocessor IEEE Micro 16 IssueID2 28–40 Occurrence Handle10.1109/40.491460
M. Zagha, B. Larson, S. Turner, and M. Itzkowits, Performance Analysis Using MIPS R10000 Performance Counters, Proceedings of Supercomputing’96, November 17–22, 1996 Pittsburgh, PA, ACM Press and IEEE Computer Society Press, 1996.
Author information
Authors and Affiliations
Corresponding author
Additional information
preliminary version of this paper appeared in the 13th ACM International Conference on Supercomputing (ICS’99).(13) This work was done while Iyer and Bhuyan were at Texas A&M. It was supported in part by a Hewlett-Packard Equipment Grant. Amato and Rauchwerger supported in part by NSF Grants ACI-9872126, EIA-9975018, EIA-0103742, EIA-9805823, ACR-0081510, ACR-0113971, CCR-0113974, EIA-9810937, EIA-0079874, by the DOE ASCI ASAP program, and by the Texas Higher Education Coordinating Board grant ATP-000512-0261-2001. Perdue supported in part by a Dept. of Education Graduate Fellowship (GAANN)
Rights and permissions
About this article
Cite this article
Iyer, R., Perdue, J., Rauchwerger, L. et al. An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications. Int J Parallel Prog 33, 307–350 (2005). https://doi.org/10.1007/s10766-004-1187-0
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10766-004-1187-0