research-article

Coupling latency-insensitivity with variable-latency for better than worst case design: a RISC case study

Authors:

Stefano Colazzo,

Paolo MantovaniAuthors Info & Claims

GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI

Pages 163 - 168

https://doi.org/10.1145/1973009.1973043

Published: 02 May 2011 Publication History

Abstract

The gap between worst and typical case delays is bound to increase in nanometer scale technologies due to the spread in process manufacturing parameters. To still profit from scaling, designs should tolerate worst case delays seamlessly and with a minimum performance degradation with respect to the typical case. We present a simple RISC core which tolerates worst case extra latency using the Latency-Insensitive Design approach coupled to a Variable-Latency mechanism. Stalls caused by excessive delay, by data and control hazards and by late memory access are dealt with in a uniform way. Compared to a pure worst-case approach, our design method permits to increase the core clock frequency by 23% in a 45 nm CMOS technology, without area and power penalty.

References

[1]

D. Ernst et al.,"Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation," Proc. MICRO-36, Dec. 2003, pp. 7--18.

Digital Library

[2]

D. Blaauw et al., "Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance," Proc. ISSCC, Feb. 2008, pp. 400--622.

[3]

S. Ghosh et al., "CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation," IEEE TCAD, vol. 26, no. 11, Nov. 2007, pp. 1947--1956.

Digital Library

[4]

L.P. Carloni et al., "A Methodology for Correct-by-Construction Latency Insensitive Design," Proc. ICCAD, Nov. 1999, pp. 309--315.

Digital Library

[5]

A.J. Martin et al., "The design of an asynchronous MIPS R3000 microprocessor," Proc. ARVLSI, Sep. 1997, pp. 164--181.

Digital Library

[6]

H.M. Jacobson et al., "Synchronous Interlocked Pipelines," Proc. ASYNC, Apr. 2002, pp. 3--12.

Digital Library

[7]

J. Cortadella et al., "Synthesis of Synchronous Elastic Architectures, Proc. DAC, July 2006, pp. 657--662.

Digital Library

[8]

X. Liang and D. Brooks, "Mitigating the Impact of Process Variations on CPU Register File and Execution Units," Proc. MICRO-39, Dec. 2006, pp. 504--514.

Digital Library

[9]

D. Ba&#241;eres et al., "Variable-Latency Design by Function Speculation," Proc. DATE, Apr. 2009, pp. 1704--1709.

Digital Library

[10]

M. Olivieri, "Design of Synchronous and Asynchronous Variable-Latency Pipelined Multipliers," IEEE TVLSI, vol. 9, no. 2, April 2001, pp. 365--376.

Digital Library

[11]

P. Ndai et al., "Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths," IEEE TVLSI, vol. 18, no. 1, Jan. 2010, pp. 53--65.

Digital Library

[12]

R.P. Brent and H.T. Kung, "A Regular Layout for Parallel Adders," IEEE TCOMP, vol. C-31, no. 3, 1982, pp. 260--264.

Digital Library

[13]

N. Pinckney et al., "A MIPS R2000 Implementation, Proc. DAC, June 2008, pp. 102--107.

Digital Library

[14]

Google Code hmc-mips, http://code.google.com/p/hmc-mips/

[15]

University of Michigan at Ann Arbor - Electrical Engineering and Computer Science Department, MiBench: a free, commercially representative embedded benchmark suite, http://www.eecs.umich.edu/mibench/

Cited By

Kamal MAfzali-Kusha ASafari SPedram M(2016)Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom InstructionsACM Transactions on Design Automation of Electronic Systems10.1145/283056621:2(1-25)Online publication date: 28-Jan-2016
https://dl.acm.org/doi/10.1145/2830566
Carloni L(2015)From Latency-Insensitive Design to Communication-Based System-Level DesignProceedings of the IEEE10.1109/JPROC.2015.2480849103:11(2133-2151)Online publication date: Nov-2015
https://doi.org/10.1109/JPROC.2015.2480849

Recommendations

A synchronous latency-insensitive RISC for better than worst-case design

Variability of process parameters in nanometer CMOS circuits makes standard worst-case design methodology waste much of the advantages of scaling. A common-case design, though, is a perilous alternative, as it gives up much of the design yield. Better ...
A latency-conscious SMT branch prediction architecture

Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because of a long-latency operation is being processed, such as a memory access or a floating-point ...
A scalable instruction queue design using dependence chains
Special Issue: Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02)

Increasing the number of instruction queue (IQ) entries in a dynamically scheduled processor exposes more instruction-level parallelism, leading to higher performance. However, increasing a conventional IQ's physical size leads to larger latencies and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI

May 2011

496 pages

ISBN:9781450306676

DOI:10.1145/1973009

General Chairs:
David Atienza
EPFL, Switzerland
,
Yuan Xie
Penn State University, USA
,
Program Chairs:
Jose L. Ayala
Federal University of Pernambuco, Brazil
,
Ken Stevens
University of Utah, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CEDA
IEEE CASS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GLSVLSI '11

Sponsor:

SIGDA

GLSVLSI '11: Great Lakes Symposium on VLSI 2011

May 2 - 4, 2011

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
112
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kamal MAfzali-Kusha ASafari SPedram M(2016)Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom InstructionsACM Transactions on Design Automation of Electronic Systems10.1145/283056621:2(1-25)Online publication date: 28-Jan-2016
https://dl.acm.org/doi/10.1145/2830566
Carloni L(2015)From Latency-Insensitive Design to Communication-Based System-Level DesignProceedings of the IEEE10.1109/JPROC.2015.2480849103:11(2133-2151)Online publication date: Nov-2015
https://doi.org/10.1109/JPROC.2015.2480849

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten