skip to main content
10.1145/1669112.1669172acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Published: 12 December 2009 Publication History

Abstract

This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and doublegate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators.
Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

References

[1]
K. Agarwal, H. Deogun, D. Sylvester, and K. Nowka, "Power Gating with Multiple Sleep Modes," ISQED, 2006.
[2]
H.-T. Ahn and D. Allstot, "A Low-jitter 1.9-V CMOS PLL for UltraSPARC Microprocessor Applications," JSSC, vol. 35, no. 3, pp. 450--454, 2000.
[3]
AMD, "AMD Opteron Processor Benchmarking for Clustered Systems," AMD WhitePaper, 2003.
[4]
C. Auth, et al., "45nm High-k+Metal Gate Strain-Ehanced Transistors," Intel Technology Journal, vol. 12, 2008.
[5]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in PACT, 2008.
[6]
N. L. Binkert, et al., "The M5 Simulator: Modeling Networked Systems," IEEE Micro, vol. 26, no. 4, pp. 52--60, 2006.
[7]
S. Borkar, et al., "Parameter Variations and Impact on Circuits and Microarchitecture," in DAC, 2003.
[8]
D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: a framework for architectural-level power analysis and optimizations," in ISCA, 2000.
[9]
D. Burger and T. M. Austin, "The simplescalar tool set, version 2.0," SIGARCH Comput. Archit. News, vol. 25, no. 3, 1997.
[10]
Denali, "Using Configurable Memory Controller Design IP with Encounter RTL Complier," Cadence CDNLive!, 2007.
[11]
M. K. Gowan, L. L. Biro, and D. B. Jackson, "Power Considerations in the Design of the Alpha 21264 Microprocessor," in DAC, 1998.
[12]
S. Gupta, S. Keckler, and D. Burger, "Technology Independent Area and Delay Estimates for Microprocessor Building Blocks," UT Austin, Department of Computer Science," Tech. Rep., 2000.
[13]
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach. 4th Edition, 2007.
[14]
J. L. Henning, "Performance Counters and Development of SPEC CPU2006," Computer Architecture News, vol. 35, no. 1, 2007.
[15]
G. Hinton, et al., "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal, vol. 1, 2001.
[16]
Intel, "P6 Family of Processors Hardware Developer's Manual," Intel White Paper, 1998.
[17]
A. Jain, et al., "A 1.2 GHz Alpha Microprocessor with 44.8 GB/s Chip Pin Bandwidth," in ISSCC, 2001.
[18]
A. Kahng, B. Li, L.-S. Peh, and K. Samadi, "ORION 2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration," in DATE, 2009.
[19]
R. E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, vol. 19, no. 2, 1999.
[20]
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, vol. 25, no. 2, 2005.
[21]
D. Koufaty and D. T. Marr, "Hyperthreading Technology in the Netburst Microarchitecture," IEEE Micro, vol. 23, no. 2, 2003.
[22]
R. Kumar, V. Zyuban, and D. M. Tullsen, "Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling," in ISCA, 2005.
[23]
H. Lee, et al., "A 16Gb/s/link, 64GB/s Bidirectional Asymmetric Memory Interface," JSSC, vol. 44, no. 4, 2009.
[24]
A. S. Leon, K. W. Tam, J. L. Shin, D. Weisner, and F. Schumacher, "A Power-Efficient High-Throughput 32-Thread SPARC Processor," JSSC, vol. 42, 2007.
[25]
S. Li, J. Ahn, J. B. Brockman, and N. P. Jouppi, "McPAT 1.0: An Integrated Power, Area, and Timing Modeling Framework for Multicore Architecture," HP Labs, Tech. Rep. HPL-2009-206.
[26]
C.-K. Luk, et al., "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in PLDI, Jun 2005.
[27]
P. Mahoney, E. Fetzer, B. Doyle, and S. Naffziger, "Clock Distribution on a Dual-Core Multi-Threaded Itanium Family Processor," in ISSCC, 2005.
[28]
M. M. K. Martin, et al., "Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset," SIGARCH Computer Architecture News, vol. 33, no. 4, 2005.
[29]
S. Mathew, et al., "A 4-GHz 300-mW 64-bit Integer Execution ALU with Dual Supply Voltages in 90-nm CMOS," JSSC, vol. 40, no. 1, 2005.
[30]
A. Naveh, et al., "Power and Thermal Management in the Intel Core Duo Processor," Intel Technology Journal, vol. 10, pp. 109--122, 2006.
[31]
U. Nawathe, et al., "Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip," JSSC, vol. 43, no. 1, 2008.
[32]
K. Nose and T. Sakurai, "Analysis and Future Trend of Short-circuit Power," IEEE TCAD, vol. 19, no. 9, 2000.
[33]
S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," in ISCA, 1997.
[34]
H. Pan, K. Asanović, R. Cohn, and C.-K. Luk, "Controlling Program Execution through Binary Instrumentation," Computer Architecture News, vol. 33, no. 5, 2005.
[35]
J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective; 2nd ed., 2003.
[36]
A. F. Rodrigues, "Parametric Sizing for Processors," Sandia National Laboratories," Tech. Rep., 2007.
[37]
S. Rusu, S. Tam, H. Muljono, D. Ayers, and J. Chang, "A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache," in ISSCC, 2006.
[38]
Semiconductor Industries Association, "Model for Assessment of CMOS Technologies and Roadmaps (MASTAR)," 2007, http://www.itrs.net/models.html.
[39]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in ASPLOS, Oct 2002.
[40]
Sun Microsystems, "OpenSPARC," http://www.opensparc.net.
[41]
S. Thoziyoor, J. Ahn, M. Monchiero, J. Brockman, and N. Jouppi, "A Comprehensive Memory Modeling Tool and its Application to the Design and Analysis of Future Memory Hierarchies," in ISCA, 2008.
[42]
D. M. Tullsen, et al., "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," in ISCA, 1996.
[43]
S. Vangal, N. Borkar, and A. Alvandpour, "A Six-port 57GB/s Double-pumped Nonblocking Router Core," in VLSI, June 2005.
[44]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and Methodological Considerations," in ISCA, 1995.
[45]
Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan, "HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects," University of Virgina, Department of Computer Science," Tech. Rep., 2003.

Cited By

View all
  • (2025)Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applicationsSimulation Modelling Practice and Theory10.1016/j.simpat.2024.103032138(103032)Online publication date: Jan-2025
  • (2025)Regulating CPU temperature with thermal-aware scheduling using a reduced order learning thermal modelFuture Generation Computer Systems10.1016/j.future.2024.107687166(107687)Online publication date: May-2025
  • (2024)Architectural and Technological Approaches for Efficient Energy Management in Multicore ProcessorsComputers10.3390/computers1304008413:4(84)Online publication date: 22-Mar-2024
  • Show More Cited By

Index Terms

  1. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
    December 2009
    601 pages
    ISBN:9781605587981
    DOI:10.1145/1669112
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    Micro-42
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)352
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applicationsSimulation Modelling Practice and Theory10.1016/j.simpat.2024.103032138(103032)Online publication date: Jan-2025
    • (2025)Regulating CPU temperature with thermal-aware scheduling using a reduced order learning thermal modelFuture Generation Computer Systems10.1016/j.future.2024.107687166(107687)Online publication date: May-2025
    • (2024)Architectural and Technological Approaches for Efficient Energy Management in Multicore ProcessorsComputers10.3390/computers1304008413:4(84)Online publication date: 22-Mar-2024
    • (2024)TREAFET: Temperature-Aware Real-Time Task Scheduling for FinFET based MulticoresACM Transactions on Embedded Computing Systems10.1145/366527623:4(1-31)Online publication date: 29-Jun-2024
    • (2024)An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/366465229:4(1-33)Online publication date: 9-Jul-2024
    • (2024)Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory AccessACM Transactions on Architecture and Code Optimization10.1145/366347921:3(1-28)Online publication date: 9-May-2024
    • (2024)FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main MemoryACM Transactions on Storage10.1145/366273620:4(1-33)Online publication date: 1-May-2024
    • (2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024
    • (2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
    • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media