skip to main content
10.1145/1362622.1362648acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A case for low-complexity MP architectures

Published: 10 November 2007 Publication History

Abstract

Advances in semiconductor technology have driven shared-memory servers toward processors with multiple cores per die and multiple threads per core. This paper presents simple hardware primitives enabling flexible and low-complexity multi-chip designs supporting an efficient inter-node coherence protocol implemented in software.
We argue that our primitives and the example design presented in this paper have lower hardware overhead, have easier (and later) verification requirements, and provide the opportunity for flexible coherence protocols and simpler protocol bug corrections than traditional designs.
Our evaluation is based on detailed full-system simulations of modern chip-multiprocessors and both commercial and HPC workloads. We compare a low-complexity system based on the proposed primitives with aggressive hardware multi-chip shared-memory systems and show that the performance is competitive across a large design space.

References

[1]
Barroso, L., et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In ISCA (June 2000).
[2]
Brewer, T., and Astfalk, G. The Evolution of the HP/Convex Exemplar. In Proceedings of COMPCON (Feb. 1997).
[3]
Carter, J. B., et al. Implementation and Performance of Munin. In SOSP (Oct. 1991).
[4]
Carter, J. B., et al. Design Alternatives for Shared Memory Multiprocessors. In HIPC (Dec. 1998).
[5]
Chaudhuri, M., et al. SMTp: An Architecture for Next-generation Scalable Multi-threading. In ISCA (June 2004).
[6]
Dahlgren, F., et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems 6, 7 (July 1995).
[7]
Gharachorloo, K., et al. Efficient ECC-Based Directory Implementations for Scalable Multiprocessors. In Computer Architecture and High-Performance Computing (Oct. 2000).
[8]
Hagersten, E., et al. Simple COMA Node Implementations. In HICSS (Jan. 1994).
[9]
Hagersten, E., et al. WildFire: A Scalable Path for SMPs. In HPCA (Jan. 1999).
[10]
Horowitz, M., et al. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors. In ISCA (May 1996).
[11]
Kongetira, P., et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro (2005).
[12]
Koufaty, D., and Marr, D. T. Hyperthreading Technology in the Netburst Microarchitecture. IEEE Micro (2003).
[13]
Krewell, K. Power5 Tops on Bandwidth. In Microprocessor Report (Dec. 2003).
[14]
Kuskin, J., et al. The Stanford FLASH Multiprocessor. In ISCA (Apr. 1994).
[15]
Laudon, J., et al. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA (June 1997).
[16]
Lenoski, D., et al. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In ISCA '90.
[17]
Lenoski, D., et al. The Stanford Dash Multiprocessor. IEEE Computer 25, 3 (Mar. 1992).
[18]
Lovett, T., et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace. In ISCA (May 1996).
[19]
Magnusson, P. S., et al. Simics: A Full System Simulation Platform. IEEE Computer 35, 2 (Feb. 2002), 50--58.
[20]
Martin, M., et al. Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In ISCA (June 2003).
[21]
Nowatzyk, A., et al. The S3.mp Scalable Shared Memory Multiprocessor. In ICPP (Aug. 1995), vol. I.
[22]
Olukotun, K., et al. The Case for a Single-Chip Multiprocessor. In ASPLOS (Oct. 1996).
[23]
OpenSPARC.net, June 2006. Available from http://www.opensparc.net.
[24]
Rajwar, R., et al. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In MICRO'01.
[25]
Reinhardt, S., et al. Decoupled Hardware Support for Distributed Shared Memory. In ISCA (May 1996).
[26]
Reinhardt, S. K., Larus, J., and Wood, D. A. Tempest and Typhoon: User-Level Shared Memory. In ISCA (May 1994).
[27]
Schoinas, I., et al. Fine-grain Access Control for Distributed Shared Memory. In ASPLOS (Oct. 1994).
[28]
Standard Performance Evaluation Corporation. SPECjbb2000. A Java Business Benchmark. White Paper.
[29]
Tendler, J. M., et al. Power4 system microarchitecture. IBM Journal of Research and Development 46, 1 (Jan. 2002).
[30]
Thekkath, R., et al. An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200. In Proceedings of the llth International Symposium on Parallel Processing (Apr. 1997).
[31]
Tullsen, D., et al. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In ISCA (June 1995).
[32]
Wallin, D., et al. Vasa: A Simulator Infrastructure with Adjustable Fidelity. In PDCS (Nov. 2005).
[33]
Weaver, D. L., and Germond, T., Eds.The SPARC Architecture Manual, Version 9. PTR, Prentice Hall, 2000.
[34]
Woo, S., et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA (June 1995).
[35]
Zeffer, H., et al. TMA: A Trap-Based Memory Architecture. In ICS (June 2006).

Cited By

View all
  • (2011)Increasing the effectiveness of directory caches by deactivating coherence for private memory blocksACM SIGARCH Computer Architecture News10.1145/2024723.200007639:3(93-104)Online publication date: 4-Jun-2011
  • (2011)Increasing the effectiveness of directory caches by deactivating coherence for private memory blocksProceedings of the 38th annual international symposium on Computer architecture10.1145/2000064.2000076(93-104)Online publication date: 4-Jun-2011
  • (2010)An Evaluation of an OS-Based Coherence Scheme for Tiled CMPsInternational Journal of Parallel Programming10.1007/s10766-010-0162-139:3(271-295)Online publication date: 29-Dec-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
ISBN:9781595937643
DOI:10.1145/1362622
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2007

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC '07
Sponsor:

Acceptance Rates

SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Increasing the effectiveness of directory caches by deactivating coherence for private memory blocksACM SIGARCH Computer Architecture News10.1145/2024723.200007639:3(93-104)Online publication date: 4-Jun-2011
  • (2011)Increasing the effectiveness of directory caches by deactivating coherence for private memory blocksProceedings of the 38th annual international symposium on Computer architecture10.1145/2000064.2000076(93-104)Online publication date: 4-Jun-2011
  • (2010)An Evaluation of an OS-Based Coherence Scheme for Tiled CMPsInternational Journal of Parallel Programming10.1007/s10766-010-0162-139:3(271-295)Online publication date: 29-Dec-2010
  • (2008)Micro-benchmarks for cluster OpenMP implementationsProceedings of the 4th international conference on OpenMP in a new era of parallelism10.5555/1789826.1789834(60-70)Online publication date: 12-May-2008
  • (2008)Micro-benchmarks for Cluster OpenMP Implementations: Memory Consistency CostsOpenMP in a New Era of Parallelism10.1007/978-3-540-79561-2_6(60-70)Online publication date: 2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media