skip to main content
10.1145/1077603.1077616acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
Article

Understanding the energy efficiency of SMT and CMP with multiclustering

Published: 08 August 2005 Publication History

Abstract

In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy efficiency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered. By carefully considering floorplans, we show that this is, in part, enabled by the small energy consumption (less than 3%) of the interconnection buses required for clustering, even with SMT. As the gap narrows, we show that the efficiency of SMT versus CMP depends on the contribution of leakage energy: at lower leakage, the CMP tends to be better than the SMT, while the SMT outperforms the CMP at higher leakage levels. We demonstrate these results over a wide range of performance and machine configurations

References

[1]
D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," in Proceedings of the 22rd Annual International Symposium on Computer Architecture (ISCA), June 1995.
[2]
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," SIGOPS Oper. Syst. Rev., vol. 30, no. 5, pp. 2--11, 1996.
[3]
Y. Li, D. Brooks, Z. Hu, K. Skadron, and P. Bose, "Understanding the Energy Efficiency of Simultaneous Multithreading," in Proceedings of the 2004 International Symposium on Low Power Electronics and Design, pp. 44--49, 2004.
[4]
R. Sasanka, S. V. Adve, Y.-K. Chen, and E. Debes, "The Energy Efficiency of CMP vs. SMT for Multimedia Workloads," in Proceedings of the 18th Annual International Conference on Supercomputing, pp. 196--206, 2004.
[5]
S. Kaxiras, G. Narlikar, A. D. Berenbaum, and Z. Hu, "Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads," in Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 211--220, 2001.
[6]
Y. Li, K. Skadron, Z. Hu, and D. Brooks, "Performance, Energy, and Thermal Considerations for SMT and CMP Architectures," in Proceedings of the Eleventh IEEE International Symposium on High Performance Computer Architecture (HPCA), 2005.
[7]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal Q1, 2001.
[8]
R. Kessler, E. McLellan, and D. Webb, "The Alpha 21264 Microprosessor Architecture," in International Conference on Computer Design, Dec. 1998.
[9]
V. V. Zyuban and P. M. Kogge, "Inherently Lower-Power High-Performance Superscalar Architectures," IEEE Transactions on Computers, vol. 50, no. 3, pp. 268--285, 2001.
[10]
F. Latorre, J. Gonzalez, and A. Gonzalez, "Back-end Assignment Schemes For Clustered Multithreaded Processors," in Proceedings of the 18th Annual International Conference on Supercomputing, pp. 316--325, 2004.
[11]
R. Canal, J.-M. Parcerisa, and A. Gonzalez, "Dynamic Code Partitioning for Clustered Architectures," Int. J. Parallel Program., vol. 29, no. 1, pp. 59--79, 2001.
[12]
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," in ISCA '96: Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp. 191--202, 1996.
[13]
P. Racunas and Y. N. Patt, "Partitioned First-Level Cache Design for Clustered Microarchitectures," in Proceedings of the 17th Annual International Conference on Supercomputing, pp. 22--31, 2003.
[14]
R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi, "Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors," in Proceedings of the 30th Annual International Symposium on Computer architecture, pp. 275--287, 2003.
[15]
A. Snavely and D. M. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," in In Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, Nov. 2000.
[16]
S. Wilton and N. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches." Compaq WRL TR-93-5, July 1994.
[17]
D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A Framework for Architectural-Level Power Analysis and Optimizations," in 27th Annual International Symposium on Computer Architecture, June 2000.
[18]
S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," in Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 206--218, June 1997.
[19]
J. Cong and D. Z. Pan, "Interconnect Estimation and Planning for Deep Submicron Designs," in DAC '99: Proceedings of the 36th ACM/IEEE Conference on Design Automation, pp. 507--510, 1999.
[20]
D. C. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," Technical Report CS-TR-97-1342, U. of Wisconsin, Madison, June 1997.
[21]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45--57, 2002.
[22]
T. Karnik, S. Borkar, and V. De, "Sub-90nm Technologies: Challenges and Opportunities for CAD," in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 203--206, 2002.

Cited By

View all

Index Terms

  1. Understanding the energy efficiency of SMT and CMP with multiclustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISLPED '05: Proceedings of the 2005 international symposium on Low power electronics and design
    August 2005
    400 pages
    ISBN:1595931376
    DOI:10.1145/1077603
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chip multiprocessing
    2. energy efficiency
    3. simultaneous multithreading

    Qualifiers

    • Article

    Conference

    ISLPED05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,159 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)The sharing architectureACM SIGARCH Computer Architecture News10.1145/2654822.254195042:1(559-574)Online publication date: 24-Feb-2014
    • (2014)The sharing architectureACM SIGPLAN Notices10.1145/2644865.254195049:4(559-574)Online publication date: 24-Feb-2014
    • (2014)The sharing architectureProceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541950(559-574)Online publication date: 24-Feb-2014
    • (2006)A SMT-ARM simulator and performance evaluationProceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems10.5555/1365739.1365772(208-210)Online publication date: 15-Feb-2006

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media