skip to main content
10.1145/1165573.1165583acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
Article

Stall cycle redistribution in a transparent fetch pipeline

Published: 04 October 2006 Publication History

Abstract

Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power.Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined Cell-like microprocessor.

References

[1]
C. J. Anderson, J. Petrovick, J. M. Keaty, J. Warnock, G. Nussbaum, J. M. Tendler, C. Carter, S. Chu, J. Clabes, J. DiLullo, P. Dudley, P. Harvey, B. Krauter, J. LeBlanc, Lu Pong-Fei, B. McCredie, G. Plum, P. J. Restle, S. Runyon, M. Scheuermann, S. Schmidt, J. Wagoner, R. Weiss, S. Weitzel, B. Zoric. Physical Design of a Fourth-generation POWER GHz Microprocessor. In Proc. of the 2001 International Solid-State Circuits Conference, Februrary 2001.
[2]
A. Baniasadi, A. Moshovos. Instruction Flow-based Front-end Throttling for Power-aware High-performance processors. In Proc. of the 2001 International Symposium on Low Power Electronics and Design, August 2001.
[3]
D. Brooks, P. Bose, S. Schuster, H. M. Jacobson, P. Kudva, A. Buyuktosunoglu, J. Wellman, V. V. Zyuban, M. Gupta, P. W. Cook. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro, December 2000.
[4]
D. Brooks, V. Tiwari, M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 25th Annual International Symposium of Computer Architecture, May 2000.
[5]
H. Cain, K. Lepak, B. Schwarz, and M. H. Lipasti. Precise and Accurate Processor Simulation. In Workshop on Computer Architecture Evaluation using Commercial Workloads.
[6]
Aristides Efthymiou and Jim D. Garside. Adaptive pipeline depth control for processor power-management. In ICCD, pages 454-457. IEEE Computer Society, 2002.
[7]
B. A. Fields, R. Bodík, M. D. Hill. Slack: Maximizing Performance Under Technological Constraints. In Proc. of the 29th Annual International Symposium of Computer Architecture, May 2002.
[8]
B. A. Fields, S. Rubin, R. Bodík. Focusing Processor Policies via Critical-path Prediction. In Proc. of the 28th Annual Internation Symposium of Computer Architecture, July 2001.
[9]
H. P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. In Proc. of the 11th Annual International Symposium on High Performance Computer Architecture, February 2005.
[10]
H. M. Jacobson. Improved clock-gating through transparent pipelining. In Proc. of the 2004 International Symposium of Low Power Electronics and Design. August 2004.
[11]
H. M. Jacobson, P. Bose, Z. Hu, A. Buyuktosunoglu, V. V. Zyuban, R. Eickemeyer, L. Eisen, J. Griswell, D. Logan, B. Sinharoy, J. M. Tendler. Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors. In Proc of the 11th Annual Internation Symposium on High Performance Computer Architecture, February 2005.
[12]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development. July/September 2005.
[13]
T. Karkhanis, J. E. Smith, P. Bose. Saving Energy with Just In Time Instruction Delivery. In Proc. of the 2002 International Symposium of Low Power Electronics and Design, August 2002.
[14]
L. Kleinrock. Queueing Systems, Volume I: Theory. Wiley Interscience, New York, 1972.
[15]
Jinson Koppanalil, Prakash Ramrakhyani, Sameer Desai, Anu Vaidyanathan, and Eric Rotenberg. A case for dynamic pipeline scaling. In CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, pages 1-8, New York, NY, USA, 2002. ACM Press.
[16]
K. Krewell. Cell Moves into the Limelight. In Microprocessor Report, February 2005.
[17]
S. Manne, A. Klauser, D. Grunwald. Pipeline Gating: Speculation Control for Energy Reduction. In Proc. of the 25th Annual International Symposium of Computer Architecture, June 1998.
[18]
G. A. Muthler, D. Crowe, S. J. Patel, S. Lumetta. Instruction Fetch Deferral using Static Slack. In Proc. of the 35th Annual International Symposium of Microarchitecture, November 2002.
[19]
Hajime Shimada, Hideki Ando, and Toshio Shimada. Pipeline stage unification: a low-energy consumption technique for future mobile processors. In Ingrid Verbauwhede and Hyung Roh, editors, ISLPED, pages 326--329. ACM, 2003.
[20]
J. E. Smith. An analysis of pipeline clocking. Technical report, University of Wisconsin, March 1990.
[21]
W. Ye, N. Vijaykrishnan, M. Kandemir, M. J. Irwin. The Design and Use of SimplePower: A Cycle-Accurate Energy Estimation Tool. In Proc. of the 36th Annual Design Automation Conference, June 1999.

Cited By

View all
  • (2018)Aggressive Slack Recycling via Transparent PipelinesProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218623(1-6)Online publication date: 23-Jul-2018
  • (2008)Instruction-driven clock scheduling with glitch mitigationProceedings of the 2008 international symposium on Low Power Electronics & Design10.1145/1393921.1394017(357-362)Online publication date: 11-Aug-2008

Index Terms

  1. Stall cycle redistribution in a transparent fetch pipeline

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISLPED '06: Proceedings of the 2006 international symposium on Low power electronics and design
    October 2006
    446 pages
    ISBN:1595934626
    DOI:10.1145/1165573
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 October 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic power
    2. instruction fetch
    3. microarchitecture
    4. pipeline gating

    Qualifiers

    • Article

    Conference

    ISLPED06
    Sponsor:
    ISLPED06: International Symposium on Low Power Electronics and Design
    October 4 - 6, 2006
    Bavaria, Tegernsee, Germany

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,159 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Aggressive Slack Recycling via Transparent PipelinesProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218623(1-6)Online publication date: 23-Jul-2018
    • (2008)Instruction-driven clock scheduling with glitch mitigationProceedings of the 2008 international symposium on Low Power Electronics & Design10.1145/1393921.1394017(357-362)Online publication date: 11-Aug-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media