skip to main content
10.1145/1088149.1088177acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Low-power, low-complexity instruction issue using compiler assistance

Published: 20 June 2005 Publication History

Abstract

In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their data-ready order rather than their original program order to achieve high performance. The logic that facilitates dynamic issue is one of the most power-hungry and time-critical components in a typical out-of-order issue processor.This paper develops a cooperative hardware/software technique to reduce complexity and energy consumption of the issue logic. The proposed scheme is based on the observation that not all instructions in a program require the same amount of dynamic reordering. Instructions that belong to basic blocks for which the compiler can perform near-optimal sche- duling do not need any intra-block instruction reordering but require only inter-block instruction overlap. In contrast, blocks where the compiler is limited by artificial dependences and memory misses require both intra-block and inter-block instruction reordering. The proposed Reorder-Sensitive Issue Scheme utilizes a novel compile-time analyzer to evaluate the quality of schedules generated by the static scheduler and to estimate the dynamic reordering requirement of instructions within each basic block. At the micro-architecture-level, we propose a novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue hardware.An evaluation of the technique on several SPEC integer benchmarks indicates that we can reduce the energy consumption in the issue queue on average by 72% with only 5% performance degradation Additionally, the proposed issue hardware is significantly less complex when compared to a conventional monolithic out-of-order issue queue, providing the potential for high clock speeds.

References

[1]
J. Abella and A. Gonzlez. Low-complexity distributed issue queue. In Proceedings of the 10th annual International Symposium on High Performance Computer Architecture, Feb 2004.]]
[2]
R. I. Bahar and S. Manne. Power and energy reduction via pipeline balancing. In 27th International Symposium on Computer Architecture, Jul 2001.]]
[3]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In 27th International Symposium on Computer Architecture, Jun 2000.]]
[4]
D. Burger and T. M. Austin. Evaluating future microprocessors: The simplescalar tool set. Technical report, Dep. of Comp, Sci., Univ. of Wisconsin, Madison, 1997.]]
[5]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the end of silicon with EDGE architectures. In IEEE Computer, July 2004.]]
[6]
A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D. Albonesi. An adaptive issue queue for reduced power at high performance. In Workshop on Power-Aware Computers Systems, held in conjunction with ASPLOS, Nov 2000.]]
[7]
R. Canal and A. Gonzlez. A low-complexity issue logic. In Proceedings of the 14th International Conference on Supercomputing, pages 327--335, 2000.]]
[8]
E. Chi, A. M. Salem, R. I. Bahar, and R. Weiss. Combining software and hardware monitoring for improved power and performance tuning. In Workshop on Interaction Between Compilers and Computer Architecture (INTERACT), Feb 2003.]]
[9]
D. Ernst, A. Hamel, and T. Austin. Cyclone: a broadcast-free dynamic instruction scheduler with selective replay. In Proceedings of the 30th annual International Symposium on Computer Architecture, pages 253--263, Jun 2003.]]
[10]
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In 28th International Symposium on Computer Architecture, Jun. 2001.]]
[11]
M. Franklin and M. Smotherman. A fill-unit approach to multiple instruction issue. In 27th Annual International Symposium on Microarchitecture, 1994.]]
[12]
M. K. Gowan, L. L. Biro, and D. B. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In Design Automation Conference, pages 726--731, 1998.]]
[13]
A. Iyer and D. Marculescu. Run-time scaling of microarchitecture resources in a processor for energy savings. In Kool Chips Workshop, held in conjunction with MICRO-33, 2000.]]
[14]
T. Jones, M. O'Boyle, J. Abella, and A. Gonzalez. Software assisted issue queue power reduction. In Proceedings of the 7th annual International Symposium on High Performance Computer Architecture, 2005.]]
[15]
A. KleinOsowski and D. J. Lilja. Minnespec: A new spec benchmark workload for simulation-based computer architecture research. In ACM Computer architecture letters, 2002.]]
[16]
A. R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th annual International Symposium on Computer Architecture, pages 59--70, Jun 2002.]]
[17]
S. Manne, A. Klauser, and D. Grunwald. Pipeline gating: Speculation control for energy reduction. In 25th International Symposium on Computer Architecture, pages 1--10, Jun 1998.]]
[18]
P. Michaud and A. Seznec. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, Feb 2001.]]
[19]
E. Morancho, J. M. Llaberia, and A. Olive; Recovery mechanism for latency misprediction. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, Sept 2001.]]
[20]
R. Nair and M. E. Hopkins. Exploiting instruction level parallelism in processor by caching scheduled groups. In annual International Symposium on Computer Architecture, 1997.]]
[21]
S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In 24th Annual International Symposium on Computer Architecture, Dec. 1997.]]
[22]
D. Ponomarev, G. Kucuk, and K. Ghose. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In 34th International Symposium on Microarchitecture, pages 90--101, Dec 2001.]]
[23]
S. E. Raasch, N. L. Binkert, and S. K. Reinhardt. A scalable instruction queue design using dependence chains. In Proceedings of the 29th annual International Symposium on Computer Architecture, pages 318--329, Jun 2002.]]
[24]
R. Rosner, Y. Almog, M. Moffie, N. Schwartz, and A. Mendelson. Power awareness through selective dynamically optimized traces. In Proceedings of the 31st annual International Symposium on Computer Architecture, page 162, 2004.]]
[25]
J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th Annual International Symposium on Microarchitecture, Dec. 2001.]]
[26]
F. Spadini, B. Fahs, S. Patel, and S. S. Lumetta. Improving quasi-dynamic schedules through region slip. In Proceedings of the International Symposium on Code generation and Optimization, pages 149--158, Mar 2003.]]
[27]
E. Talpes and D. Marculescu. Power reduction through work reuse. In International Symposium on Low Power Electronics and Design, 2001.]]
[28]
O. S. Unsal, I. Koren, C. M. Krishna, and C. A. Mortiz. Cool-fetch: A compiler-enabled IPC estimation based framework for energy reduction. In ACM Computer architecture letters, 2002.]]
[29]
M. Valluri, L. John, and H. Hanson. Exploiting compiler-generated schedules for energy savings in high-performance processors. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003.]]
[30]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A transparent dynamic optimization system. In Proceedings of Programming Language Design and Implementation, 2000.]]
[31]
V. V. Zyuban and P. Kogge. Inherently lower-power high-performance superscalar architectures. In IEEE Transactions on Computers, pages 268--285, Mar. 2001.]]
[32]
TRIMARAN: An Infrastructure for Research in Instruction-Level Parallelism http://www.trimaran.org/]]

Cited By

View all
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS05
Sponsor:
ICS05: International Conference on Supercomputing 2005
June 20 - 22, 2005
Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
  • (2006)Impact of virtual execution environments on processor energy consumption and hardware adaptationProceedings of the 2nd international conference on Virtual execution environments10.1145/1134760.1134775(100-110)Online publication date: 14-Jun-2006
  • (2006)Hybrid-scheduling for reduced energy consumption in high-performance processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88405514:9(1039-1043)Online publication date: 1-Sep-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media