Article

Exploring the design space of LUT-based transparent accelerators

Authors:

Krisztiàn FlautnerAuthors Info & Claims

CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

Pages 11 - 21

https://doi.org/10.1145/1086297.1086301

Published: 24 September 2005 Publication History

Abstract

Instruction set customization accelerates the performance of applications by compressing the length of critical dependence paths and reducing the demands on processor resources. With instruction set customization, specialized accelerators are added to a conventional processor to atomically execute dataflow subgraphs. Accelerators that are exploited without explicit changes to the instruction set architecture of the processor are said to be transparent. Transparent acceleration relies on a light-weight hardware engine to dynamically generate control signals for the accelerator, using subgraphs delineated by a compiler. The design of transparent subgraph accelerators is challenging, as critical subgraphs need to be supported efficiently while maintaining area and timing constraints. Additionally, more complex accelerators require more sophisticated control generation engines. These factors must be carefully balanced. In this work, we investigate the design of subgraph accelerators using configurable lookup table structures. These designs provide an effective paradigm to execute a wide range of subgraphs involving arithmetic and logic operations. We describe why lookup table designs are effective, how they fit into a transparent acceleration framework, and evaluate the effectiveness of a wide range of de-signs using both simulation and logic synthesis.

References

[1]

ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf.

[2]

K. Atasu, L. Pozzi, and P. Ienne. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proc. of the 40th Design Automation Conference, pages 256--261, June 2003.

Digital Library

[3]

P. M. Athanas and H. S. Silverman. Processor reconfiguration through instruction set metamorphosis. IEEE Computer, 26(3):11--18, 1993.

Digital Library

[4]

T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Transactions on Computers, 35(2):59--67, Feb. 2002.

Digital Library

[5]

A. Bracy, P. Prahlad, and A. Roth. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 18--29, Dec. 2004.

Digital Library

[6]

R. P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Trans. Comput., C-31(3):260--264, 1982.

Digital Library

[7]

P. Brisk et al. Instruction generation and regularity extraction for reconfigurable processors. In Proc. of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 262--269, 2002.

Digital Library

[8]

J. E. Carrillo and P. Chow. The effect of reconfigurable units in superscalar processors. In Proc. of the 9th ACM Symposium on Field Programmable Gate Arrays, pages 141--150. ACM Press, 2001.

Digital Library

[9]

N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30--40, Dec. 2004.

Digital Library

[10]

N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 272--283, June 2005.

Digital Library

[11]

N. Clark, H. Zhong, and S. Mahlke. Processor acceleration through automated instruction set customization. In Proc. of the 36th Annual International Symposium on Microarchitecture, pages 129--140, Dec. 2003.

Digital Library

[12]

D. Goodwin and D. Petkov. Automatic generation of application specific processors. In Proc. of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 137--147, 2003.

Digital Library

[13]

J. R. Hauser and J. Wawrzynek. GARP: A MIPS processor with a reconfigurable coprocessor. In Proc. of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines, pages 12--21, Apr. 1997.

Digital Library

[14]

S. Hu and J. Smith. Using dynamic binary translation to fuse dependent instructions. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 213--226, 2004.

Digital Library

[15]

I. Huang. Co-Synthesis of Instruction Sets and Microarchitectures. PhD thesis, University of Southern California, 1994.

[16]

Q. Jacobson and J. E. Smith. Instruction pre-processing in trace processors. In Proc. of the 5th International Symposium on High-Performance Computer Architecture, pages 125--133, 1999.

Digital Library

[17]

P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Comput., C-22(8):786--793, 1973.

Digital Library

[18]

I. Koren. Computer Arithmetic Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.

Digital Library

[19]

C. Lee, M. Potkonjak, and W. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proc. of the 30th Annual International Symposium on Microarchitecture, pages 330--335, 1997.

Digital Library

[20]

S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Trans. Comput., 50(6):590--608, 2001.

Digital Library

[21]

J. Phillips and S. Vassiliadis. High-performance 3-1 interlock collapsing alu's. IEEE Trans. Comput., 43(3):257--268, 1994.

Digital Library

[22]

R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable function units. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 172--180, Dec. 1994.

Digital Library

[23]

P. Sassone and D. S. Wills. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 7--17, Dec. 2004.

Digital Library

[24]

Y. Sazeides, S. Vassiliadis, and J. E. Smith. The performance potential of data dependence speculation & collapsing. In Proc. of the 29th Annual International Symposium on Microarchitecture, pages 238--247, 1996.

Digital Library

[25]

F. Sun et al. Synthesis of custom processors based on extensible platforms. In Proc. of the 2002 International Conference on Computer Aided Design, pages 641--648, Nov. 2002.

Digital Library

[26]

Trimaran. An infrastructure for research in ILP, 2000. http://www.trimaran.org.

[27]

M. J. Wirthlin and B. L. Hutchings. DISC: The dynamic instruction set computer. In Proc. of the 1995 Field Programmable Gate Arrays for Fast Board Development and Reconfigurable Computing, pages 92--103, 1995.

Digital Library

[28]

Z. A. Ye et al. CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 225--235, 2000.

Digital Library

[29]

S. Yehia and O. Temam. From sequences of dependent instructions to functions: An approach for improving performance without ilp or speculation. In Proc. of the 31th Annual International Symposium on Computer Architecture, pages 238--249, June 2004.

Digital Library

[30]

P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In Proc. of the 41st Design Automation Conference, pages 723--728, June 2004.

Digital Library

Cited By

Paul SKrishna AQian WKaram RBhunia S(2015)MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233253823:6(1005-1016)Online publication date: Jun-2015
https://doi.org/10.1109/TVLSI.2014.2332538
Yazdanbakhsh AKamal MFakhraie SAfzali-Kusha ASafari SPedram M(2014)Implementation-aware selection of the custom instruction set for extensible processorsMicroprocessors and Microsystems10.1016/j.micpro.2014.05.00738:7(681-691)Online publication date: Oct-2014
https://doi.org/10.1016/j.micpro.2014.05.007
Paul SBhunia SPaul SBhunia S(2013)Motivation for a Memory-Based Computing HardwareComputing with Memory for Energy-Efficient Robust Systems10.1007/978-1-4614-7798-3_3(29-34)Online publication date: 8-Aug-2013
https://doi.org/10.1007/978-1-4614-7798-3_3
Show More Cited By

Index Terms

Exploring the design space of LUT-based transparent accelerators

Recommendations

Exploring the design space of programmable regular expression matching accelerators

State-of-the-art regular expression (regex) accelerators combine parallel programmable state machines with cascaded, wide-issue instruction processors to improve the storage efficiency and the processing rates, while preserving the programmability. The ...
PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

Deep Convolutional Neural Networks (DCNNs) exhibit remarkable performance in a number of pattern recognition and classification tasks. Modern DCNNs involve many millions of parameters and billions of operations. Inference using such DCNNs, if ...
Fault Testing a Synthesizable Embedded Processor at Gate Level using UltraScale FPGA Emulation
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

While several applications exist to fault-test software at the abstract level, gate-level fault testing has traditionally been limited to injecting faults into small, dedicated circuits, due the computational time required for gate-level simulations and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

September 2005

326 pages

ISBN:159593149X

DOI:10.1145/1086297

General Chairs:
Thomas M. Conte
North Carolina State University
,
Paolo Faraboschi
Hewlett-Packard Laboratories
,
Program Chairs:
Bill Mangione-Smith
Quantum Intellectual Property Services
,
Walid Najjar
University of California, Riverside

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CASES05

Sponsor:

CASES05: 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

September 24 - 27, 2005

California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
350
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Paul SKrishna AQian WKaram RBhunia S(2015)MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233253823:6(1005-1016)Online publication date: Jun-2015
https://doi.org/10.1109/TVLSI.2014.2332538
Yazdanbakhsh AKamal MFakhraie SAfzali-Kusha ASafari SPedram M(2014)Implementation-aware selection of the custom instruction set for extensible processorsMicroprocessors and Microsystems10.1016/j.micpro.2014.05.00738:7(681-691)Online publication date: Oct-2014
https://doi.org/10.1016/j.micpro.2014.05.007
Paul SBhunia SPaul SBhunia S(2013)Motivation for a Memory-Based Computing HardwareComputing with Memory for Energy-Efficient Robust Systems10.1007/978-1-4614-7798-3_3(29-34)Online publication date: 8-Aug-2013
https://doi.org/10.1007/978-1-4614-7798-3_3
Gupta SFeng SAnsari AMahlke SAugust DGaluzzi CCarro LMoshovos APrvulovic M(2011)Bundled execution of recurring traces for energy-efficient general purpose processingProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155623(12-23)Online publication date: 3-Dec-2011
https://dl.acm.org/doi/10.1145/2155620.2155623
Kornaros GMotakis A(2010)On Scaling Speedup with Coarse-Grain Coprocessor Accelerators on Reconfigurable PlatformsProceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools10.1109/DSD.2010.79(355-362)Online publication date: 1-Sep-2010
https://dl.acm.org/doi/10.1109/DSD.2010.79
(2010)Authors index2010 12th Biennial Baltic Electronics Conference10.1109/BEC.2010.5630747(365-368)Online publication date: Oct-2010
https://doi.org/10.1109/BEC.2010.5630747
Jówiak LNedjah NFigueroa M(2010)Modern development methods and tools for embedded reconfigurable systemsIntegration, the VLSI Journal10.1016/j.vlsi.2009.06.00243:1(1-33)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1016/j.vlsi.2009.06.002
Mehdipour FNoori HJavadi BHonda HInoue KMurakami KWakabayashi K(2009)A combined analytical and simulation-based model for performance evaluation of a reconfigurable instruction set processorProceedings of the 2009 Asia and South Pacific Design Automation Conference10.5555/1509633.1509765(564-569)Online publication date: 19-Jan-2009
https://dl.acm.org/doi/10.5555/1509633.1509765
MEHDIPOUR FNOORI HINOUE KMURAKAMI K(2009)Rapid Design Space Exploration of a Reconfigurable Instruction-Set ProcessorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E92.A.3182E92-A:12(3182-3192)Online publication date: 2009
https://doi.org/10.1587/transfun.E92.A.3182
Zuluaga MTopham N(2009)Design-space exploration of resource-sharing solutions for custom instruction set extensionsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2009.202635528:12(1788-1801)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.1109/TCAD.2009.2026355
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten