Article

Low-power, low-complexity instruction issue using compiler assistance

Authors:

Madhavi G. Valluri,

Kathryn S. McKinleyAuthors Info & Claims

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

Pages 209 - 218

https://doi.org/10.1145/1088149.1088177

Published: 20 June 2005 Publication History

Abstract

In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their data-ready order rather than their original program order to achieve high performance. The logic that facilitates dynamic issue is one of the most power-hungry and time-critical components in a typical out-of-order issue processor.This paper develops a cooperative hardware/software technique to reduce complexity and energy consumption of the issue logic. The proposed scheme is based on the observation that not all instructions in a program require the same amount of dynamic reordering. Instructions that belong to basic blocks for which the compiler can perform near-optimal sche- duling do not need any intra-block instruction reordering but require only inter-block instruction overlap. In contrast, blocks where the compiler is limited by artificial dependences and memory misses require both intra-block and inter-block instruction reordering. The proposed Reorder-Sensitive Issue Scheme utilizes a novel compile-time analyzer to evaluate the quality of schedules generated by the static scheduler and to estimate the dynamic reordering requirement of instructions within each basic block. At the micro-architecture-level, we propose a novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue hardware.An evaluation of the technique on several SPEC integer benchmarks indicates that we can reduce the energy consumption in the issue queue on average by 72% with only 5% performance degradation Additionally, the proposed issue hardware is significantly less complex when compared to a conventional monolithic out-of-order issue queue, providing the potential for high clock speeds.

References

[1]

J. Abella and A. Gonzlez. Low-complexity distributed issue queue. In Proceedings of the 10th annual International Symposium on High Performance Computer Architecture, Feb 2004.]]

Digital Library

[2]

R. I. Bahar and S. Manne. Power and energy reduction via pipeline balancing. In 27th International Symposium on Computer Architecture, Jul 2001.]]

Digital Library

[3]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In 27th International Symposium on Computer Architecture, Jun 2000.]]

Digital Library

[4]

D. Burger and T. M. Austin. Evaluating future microprocessors: The simplescalar tool set. Technical report, Dep. of Comp, Sci., Univ. of Wisconsin, Madison, 1997.]]

[5]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the end of silicon with EDGE architectures. In IEEE Computer, July 2004.]]

Digital Library

[6]

A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D. Albonesi. An adaptive issue queue for reduced power at high performance. In Workshop on Power-Aware Computers Systems, held in conjunction with ASPLOS, Nov 2000.]]

Digital Library

[7]

R. Canal and A. Gonzlez. A low-complexity issue logic. In Proceedings of the 14th International Conference on Supercomputing, pages 327--335, 2000.]]

Digital Library

[8]

E. Chi, A. M. Salem, R. I. Bahar, and R. Weiss. Combining software and hardware monitoring for improved power and performance tuning. In Workshop on Interaction Between Compilers and Computer Architecture (INTERACT), Feb 2003.]]

Digital Library

[9]

D. Ernst, A. Hamel, and T. Austin. Cyclone: a broadcast-free dynamic instruction scheduler with selective replay. In Proceedings of the 30th annual International Symposium on Computer Architecture, pages 253--263, Jun 2003.]]

Digital Library

[10]

D. Folegnani and A. Gonzalez. Energy-effective issue logic. In 28th International Symposium on Computer Architecture, Jun. 2001.]]

Digital Library

[11]

M. Franklin and M. Smotherman. A fill-unit approach to multiple instruction issue. In 27th Annual International Symposium on Microarchitecture, 1994.]]

Digital Library

[12]

M. K. Gowan, L. L. Biro, and D. B. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In Design Automation Conference, pages 726--731, 1998.]]

Digital Library

[13]

A. Iyer and D. Marculescu. Run-time scaling of microarchitecture resources in a processor for energy savings. In Kool Chips Workshop, held in conjunction with MICRO-33, 2000.]]

[14]

T. Jones, M. O'Boyle, J. Abella, and A. Gonzalez. Software assisted issue queue power reduction. In Proceedings of the 7th annual International Symposium on High Performance Computer Architecture, 2005.]]

Digital Library

[15]

A. KleinOsowski and D. J. Lilja. Minnespec: A new spec benchmark workload for simulation-based computer architecture research. In ACM Computer architecture letters, 2002.]]

[16]

A. R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th annual International Symposium on Computer Architecture, pages 59--70, Jun 2002.]]

Digital Library

[17]

S. Manne, A. Klauser, and D. Grunwald. Pipeline gating: Speculation control for energy reduction. In 25th International Symposium on Computer Architecture, pages 1--10, Jun 1998.]]

Digital Library

[18]

P. Michaud and A. Seznec. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, Feb 2001.]]

Digital Library

[19]

E. Morancho, J. M. Llaberia, and A. Olive; Recovery mechanism for latency misprediction. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, Sept 2001.]]

Digital Library

[20]

R. Nair and M. E. Hopkins. Exploiting instruction level parallelism in processor by caching scheduled groups. In annual International Symposium on Computer Architecture, 1997.]]

Digital Library

[21]

S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In 24th Annual International Symposium on Computer Architecture, Dec. 1997.]]

Digital Library

[22]

D. Ponomarev, G. Kucuk, and K. Ghose. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In 34th International Symposium on Microarchitecture, pages 90--101, Dec 2001.]]

Digital Library

[23]

S. E. Raasch, N. L. Binkert, and S. K. Reinhardt. A scalable instruction queue design using dependence chains. In Proceedings of the 29th annual International Symposium on Computer Architecture, pages 318--329, Jun 2002.]]

Digital Library

[24]

R. Rosner, Y. Almog, M. Moffie, N. Schwartz, and A. Mendelson. Power awareness through selective dynamically optimized traces. In Proceedings of the 31st annual International Symposium on Computer Architecture, page 162, 2004.]]

Digital Library

[25]

J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th Annual International Symposium on Microarchitecture, Dec. 2001.]]

Digital Library

[26]

F. Spadini, B. Fahs, S. Patel, and S. S. Lumetta. Improving quasi-dynamic schedules through region slip. In Proceedings of the International Symposium on Code generation and Optimization, pages 149--158, Mar 2003.]]

Digital Library

[27]

E. Talpes and D. Marculescu. Power reduction through work reuse. In International Symposium on Low Power Electronics and Design, 2001.]]

Digital Library

[28]

O. S. Unsal, I. Koren, C. M. Krishna, and C. A. Mortiz. Cool-fetch: A compiler-enabled IPC estimation based framework for energy reduction. In ACM Computer architecture letters, 2002.]]

[29]

M. Valluri, L. John, and H. Hanson. Exploiting compiler-generated schedules for energy savings in high-performance processors. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003.]]

Digital Library

[30]

V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A transparent dynamic optimization system. In Proceedings of Programming Language Design and Implementation, 2000.]]

Digital Library

[31]

V. V. Zyuban and P. Kogge. Inherently lower-power high-performance superscalar architectures. In IEEE Transactions on Computers, pages 268--285, Mar. 2001.]]

Digital Library

[32]

TRIMARAN: An Infrastructure for Research in Instruction-Level Parallelism http://www.trimaran.org/]]

Cited By

McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451143
McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451143
McFarlin DTucker CZilles CSarkar VBodik R(2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451143
Show More Cited By

Recommendations

A low-complexity issue logic
ICS '00: Proceedings of the 14th international conference on Supercomputing

One of the main concerns in today's processor design is the issue logic. Instruction-level parallelism is usually favored by an out-of-order issue mechanism where instructions can issue independently of the program order. The out-of-order scheme yields ...
Reducing instruction bit-width for low-power VLIW architectures

VLIW (very long instruction word) architectures have proven to be useful for embedded applications with abundant instruction level parallelism. But due to the long instruction bus width it often consumes more power and memory space than necessary. One ...
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

In modern day high-performance processors, the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption. Renaming logic in the front-end of the processor is one of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

June 2005

414 pages

ISBN:1595931678

DOI:10.1145/1088149

General Chair:
Arvind
MIT
,
Program Chair:
Larry Rudolph
MIT

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS05

Sponsor:

SIGARCH

ICS05: International Conference on Supercomputing 2005

June 20 - 22, 2005

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
392
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451143
McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451143
McFarlin DTucker CZilles CSarkar VBodik R(2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451143
Hu SJohn LBoehm HGrove D(2006)Impact of virtual execution environments on processor energy consumption and hardware adaptationProceedings of the 2nd international conference on Virtual execution environments10.1145/1134760.1134775(100-110)Online publication date: 14-Jun-2006
https://dl.acm.org/doi/10.1145/1134760.1134775
Valluri MJohn LHanson H(2006)Hybrid-scheduling for reduced energy consumption in high-performance processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88405514:9(1039-1043)Online publication date: 1-Sep-2006
https://dl.acm.org/doi/10.1109/TVLSI.2006.884055

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten