skip to main content
10.1145/1086297.1086301acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Exploring the design space of LUT-based transparent accelerators

Published: 24 September 2005 Publication History

Abstract

Instruction set customization accelerates the performance of applications by compressing the length of critical dependence paths and reducing the demands on processor resources. With instruction set customization, specialized accelerators are added to a conventional processor to atomically execute dataflow subgraphs. Accelerators that are exploited without explicit changes to the instruction set architecture of the processor are said to be transparent. Transparent acceleration relies on a light-weight hardware engine to dynamically generate control signals for the accelerator, using subgraphs delineated by a compiler. The design of transparent subgraph accelerators is challenging, as critical subgraphs need to be supported efficiently while maintaining area and timing constraints. Additionally, more complex accelerators require more sophisticated control generation engines. These factors must be carefully balanced. In this work, we investigate the design of subgraph accelerators using configurable lookup table structures. These designs provide an effective paradigm to execute a wide range of subgraphs involving arithmetic and logic operations. We describe why lookup table designs are effective, how they fit into a transparent acceleration framework, and evaluate the effectiveness of a wide range of de-signs using both simulation and logic synthesis.

References

[1]
ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf.
[2]
K. Atasu, L. Pozzi, and P. Ienne. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proc. of the 40th Design Automation Conference, pages 256--261, June 2003.
[3]
P. M. Athanas and H. S. Silverman. Processor reconfiguration through instruction set metamorphosis. IEEE Computer, 26(3):11--18, 1993.
[4]
T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Transactions on Computers, 35(2):59--67, Feb. 2002.
[5]
A. Bracy, P. Prahlad, and A. Roth. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 18--29, Dec. 2004.
[6]
R. P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Trans. Comput., C-31(3):260--264, 1982.
[7]
P. Brisk et al. Instruction generation and regularity extraction for reconfigurable processors. In Proc. of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 262--269, 2002.
[8]
J. E. Carrillo and P. Chow. The effect of reconfigurable units in superscalar processors. In Proc. of the 9th ACM Symposium on Field Programmable Gate Arrays, pages 141--150. ACM Press, 2001.
[9]
N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30--40, Dec. 2004.
[10]
N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 272--283, June 2005.
[11]
N. Clark, H. Zhong, and S. Mahlke. Processor acceleration through automated instruction set customization. In Proc. of the 36th Annual International Symposium on Microarchitecture, pages 129--140, Dec. 2003.
[12]
D. Goodwin and D. Petkov. Automatic generation of application specific processors. In Proc. of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 137--147, 2003.
[13]
J. R. Hauser and J. Wawrzynek. GARP: A MIPS processor with a reconfigurable coprocessor. In Proc. of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines, pages 12--21, Apr. 1997.
[14]
S. Hu and J. Smith. Using dynamic binary translation to fuse dependent instructions. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 213--226, 2004.
[15]
I. Huang. Co-Synthesis of Instruction Sets and Microarchitectures. PhD thesis, University of Southern California, 1994.
[16]
Q. Jacobson and J. E. Smith. Instruction pre-processing in trace processors. In Proc. of the 5th International Symposium on High-Performance Computer Architecture, pages 125--133, 1999.
[17]
P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Comput., C-22(8):786--793, 1973.
[18]
I. Koren. Computer Arithmetic Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.
[19]
C. Lee, M. Potkonjak, and W. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proc. of the 30th Annual International Symposium on Microarchitecture, pages 330--335, 1997.
[20]
S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Trans. Comput., 50(6):590--608, 2001.
[21]
J. Phillips and S. Vassiliadis. High-performance 3-1 interlock collapsing alu's. IEEE Trans. Comput., 43(3):257--268, 1994.
[22]
R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable function units. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 172--180, Dec. 1994.
[23]
P. Sassone and D. S. Wills. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 7--17, Dec. 2004.
[24]
Y. Sazeides, S. Vassiliadis, and J. E. Smith. The performance potential of data dependence speculation & collapsing. In Proc. of the 29th Annual International Symposium on Microarchitecture, pages 238--247, 1996.
[25]
F. Sun et al. Synthesis of custom processors based on extensible platforms. In Proc. of the 2002 International Conference on Computer Aided Design, pages 641--648, Nov. 2002.
[26]
Trimaran. An infrastructure for research in ILP, 2000. http://www.trimaran.org.
[27]
M. J. Wirthlin and B. L. Hutchings. DISC: The dynamic instruction set computer. In Proc. of the 1995 Field Programmable Gate Arrays for Fast Board Development and Reconfigurable Computing, pages 92--103, 1995.
[28]
Z. A. Ye et al. CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 225--235, 2000.
[29]
S. Yehia and O. Temam. From sequences of dependent instructions to functions: An approach for improving performance without ilp or speculation. In Proc. of the 31th Annual International Symposium on Computer Architecture, pages 238--249, June 2004.
[30]
P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In Proc. of the 41st Design Automation Conference, pages 723--728, June 2004.

Cited By

View all
  • (2015)MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233253823:6(1005-1016)Online publication date: Jun-2015
  • (2014)Implementation-aware selection of the custom instruction set for extensible processorsMicroprocessors and Microsystems10.1016/j.micpro.2014.05.00738:7(681-691)Online publication date: Oct-2014
  • (2013)Motivation for a Memory-Based Computing HardwareComputing with Memory for Energy-Efficient Robust Systems10.1007/978-1-4614-7798-3_3(29-34)Online publication date: 8-Aug-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
September 2005
326 pages
ISBN:159593149X
DOI:10.1145/1086297
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator design
  2. efficient computation
  3. embedded processing

Qualifiers

  • Article

Conference

CASES05

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233253823:6(1005-1016)Online publication date: Jun-2015
  • (2014)Implementation-aware selection of the custom instruction set for extensible processorsMicroprocessors and Microsystems10.1016/j.micpro.2014.05.00738:7(681-691)Online publication date: Oct-2014
  • (2013)Motivation for a Memory-Based Computing HardwareComputing with Memory for Energy-Efficient Robust Systems10.1007/978-1-4614-7798-3_3(29-34)Online publication date: 8-Aug-2013
  • (2011)Bundled execution of recurring traces for energy-efficient general purpose processingProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155623(12-23)Online publication date: 3-Dec-2011
  • (2010)On Scaling Speedup with Coarse-Grain Coprocessor Accelerators on Reconfigurable PlatformsProceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools10.1109/DSD.2010.79(355-362)Online publication date: 1-Sep-2010
  • (2010)Authors index2010 12th Biennial Baltic Electronics Conference10.1109/BEC.2010.5630747(365-368)Online publication date: Oct-2010
  • (2010)Modern development methods and tools for embedded reconfigurable systemsIntegration, the VLSI Journal10.1016/j.vlsi.2009.06.00243:1(1-33)Online publication date: 1-Jan-2010
  • (2009)A combined analytical and simulation-based model for performance evaluation of a reconfigurable instruction set processorProceedings of the 2009 Asia and South Pacific Design Automation Conference10.5555/1509633.1509765(564-569)Online publication date: 19-Jan-2009
  • (2009)Rapid Design Space Exploration of a Reconfigurable Instruction-Set ProcessorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E92.A.3182E92-A:12(3182-3192)Online publication date: 2009
  • (2009)Design-space exploration of resource-sharing solutions for custom instruction set extensionsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2009.202635528:12(1788-1801)Online publication date: 1-Dec-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media