The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

Farkas, Keith I.; Chow, Paul; Jouppi, Norman P.; Vranesic, Zvonko

doi:10.1023/A:1018782806674

The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

Published: October 1999

Volume 27, pages 327–356, (1999)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Keith I. Farkas,
Paul Chow,
Norman P. Jouppi &
…
Zvonko Vranesic

49 Accesses
3 Citations
Explore all metrics

Abstract

The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluster architecture may have significant performance advantages at feature sizes below 0.35 μm, and warrants further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Architecture

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Challenges in the Implementation of MrsP

REFERENCES

R. E. Kessler, The Alpha 21264 Microprocessor, IEEE Micro. 19(2):24–36 (1999)
Google Scholar
Kenneth C. Yeager, The MIPS R10000 Superscalar Microprocessor, IEEE Micro. 16(2):28–40 (1996).
Google Scholar
Keith I. Farkas, Memory-system Design Considerations for Dynamically-scheduled Microprocessors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada (January 1997). (URL: http://www.research.digital.com/wrl/ people/farkas/papers/thesis_phd.html)
Google Scholar
James E. Smith, Decoupled Acess/Execute Computer Architecture, Proc. Ninth Int'l. Symp. Computer Architecture, pp. 112–119 (1982).
P. Geoffrey Lowney, Stefan Freudenberger, Thomas Karzes, W. D. Lichtenstein, Robert P. Nix, John S. O'Donnell, and John C. Ruttenberg, The Multiflow Trace Scheduling Compiler, J. Supercomputing 7( 1,2):51–142 (May 1993).
Google Scholar
Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, Multiscalar Processors, Proc. 22st Int'l. Symp. Computer Architecture, pp. 414–425 (1995).
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded Processor, Proc. 23rd Int'l. Symp. Computer Architecture, pp. 191–202 (May 1996).
Basem A. Nayfeh, Lance Hammond, and Kunle Olukotun, Evaluation of Design Alternatives for a Multiprocessor Microprocessor, Proc. 23rd Int'l. Symp. Computer Architecture, pp. 67–77 (May 1996).
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Compilers, Principles, Techniques and Tools, Addison-Wesley Publishing Company, Reading, Massachusetts (1986).
Google Scholar
Preston Briggs, Keith D. Cooper, and Linda Torczon, Improvements to Graph Coloring Register Allocation, ACM Trans. Progr. Lang. Syst. 16(3):428–455 (May 1994).
Google Scholar
Amitabh Srivastava and Alan Eustace, Atom: A System for Building Customized Program Analysis Tools, Proc. ACM SIGPLAN Conf. Progr. Lang. (March 1994).
Keith I. Farkas and Norman P. Jouppi, Complexity/Performance Tradeoffs with Non-Blocking Loads, Proc. 21st Int'l. Symp. Computer Architecture, pp. 211–222 (1994).
Scott McFarling, Combining Branch Predictors, DEC WRL Technical Note TN-36 (1993).
Subbarao Palacharla, Norman P. Jouppi, and James E. Smith, Complexity-Effective Superscalar Processors, Proc. 24th Ann. Int'l. Symp. Computer Architecture, pp. 206–218 (1997).
Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen-mei W. Hwu, IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors, Proc. 18th Ann. Int'l. Symp. Computer Architecture, pp. 266–275 (1991).

Download references

Authors

Keith I. Farkas
View author publications
You can also search for this author in PubMed Google Scholar
Paul Chow
View author publications
You can also search for this author in PubMed Google Scholar
Norman P. Jouppi
View author publications
You can also search for this author in PubMed Google Scholar
Zvonko Vranesic
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farkas, K.I., Chow, P., Jouppi, N.P. et al. The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning. International Journal of Parallel Programming 27, 327–356 (1999). https://doi.org/10.1023/A:1018782806674

Download citation

Issue Date: October 1999
DOI: https://doi.org/10.1023/A:1018782806674

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

Abstract

Access this article

Similar content being viewed by others

The Architecture

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Challenges in the Implementation of MrsP

REFERENCES

Rights and permissions

About this article

Cite this article

Navigation

The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

Abstract

Access this article

Similar content being viewed by others

The Architecture

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Challenges in the Implementation of MrsP

REFERENCES

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation