Skip to main content
Log in

The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluster architecture may have significant performance advantages at feature sizes below 0.35 μm, and warrants further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. R. E. Kessler, The Alpha 21264 Microprocessor, IEEE Micro. 19(2):24–36 (1999)

    Google Scholar 

  2. Kenneth C. Yeager, The MIPS R10000 Superscalar Microprocessor, IEEE Micro. 16(2):28–40 (1996).

    Google Scholar 

  3. Keith I. Farkas, Memory-system Design Considerations for Dynamically-scheduled Microprocessors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada (January 1997). (URL: http://www.research.digital.com/wrl/ people/farkas/papers/thesis_phd.html)

    Google Scholar 

  4. James E. Smith, Decoupled Acess/Execute Computer Architecture, Proc. Ninth Int'l. Symp. Computer Architecture, pp. 112–119 (1982).

  5. P. Geoffrey Lowney, Stefan Freudenberger, Thomas Karzes, W. D. Lichtenstein, Robert P. Nix, John S. O'Donnell, and John C. Ruttenberg, The Multiflow Trace Scheduling Compiler, J. Supercomputing 7( 1,2):51–142 (May 1993).

    Google Scholar 

  6. Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, Multiscalar Processors, Proc. 22st Int'l. Symp. Computer Architecture, pp. 414–425 (1995).

  7. Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded Processor, Proc. 23rd Int'l. Symp. Computer Architecture, pp. 191–202 (May 1996).

  8. Basem A. Nayfeh, Lance Hammond, and Kunle Olukotun, Evaluation of Design Alternatives for a Multiprocessor Microprocessor, Proc. 23rd Int'l. Symp. Computer Architecture, pp. 67–77 (May 1996).

  9. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Compilers, Principles, Techniques and Tools, Addison-Wesley Publishing Company, Reading, Massachusetts (1986).

    Google Scholar 

  10. Preston Briggs, Keith D. Cooper, and Linda Torczon, Improvements to Graph Coloring Register Allocation, ACM Trans. Progr. Lang. Syst. 16(3):428–455 (May 1994).

    Google Scholar 

  11. Amitabh Srivastava and Alan Eustace, Atom: A System for Building Customized Program Analysis Tools, Proc. ACM SIGPLAN Conf. Progr. Lang. (March 1994).

  12. Keith I. Farkas and Norman P. Jouppi, Complexity/Performance Tradeoffs with Non-Blocking Loads, Proc. 21st Int'l. Symp. Computer Architecture, pp. 211–222 (1994).

  13. Scott McFarling, Combining Branch Predictors, DEC WRL Technical Note TN-36 (1993).

  14. Subbarao Palacharla, Norman P. Jouppi, and James E. Smith, Complexity-Effective Superscalar Processors, Proc. 24th Ann. Int'l. Symp. Computer Architecture, pp. 206–218 (1997).

  15. Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen-mei W. Hwu, IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors, Proc. 18th Ann. Int'l. Symp. Computer Architecture, pp. 266–275 (1991).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farkas, K.I., Chow, P., Jouppi, N.P. et al. The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning. International Journal of Parallel Programming 27, 327–356 (1999). https://doi.org/10.1023/A:1018782806674

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018782806674

Navigation