PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Schreiber, Robert; Aditya, Shail; Mahlke, Scott; Kathail, Vinod; Rau, B. Ramakrishna; Cronquist, Darren; Sivaraman, Mukund

doi:10.1023/A:1015341305426

Robert Schreiber¹,
Shail Aditya¹,
Scott Mahlke¹,
Vinod Kathail¹,
B. Ramakrishna Rau¹,
Darren Cronquist¹ &
…
Mukund Sivaraman¹

158 Accesses
63 Citations
3 Altmetric
Explore all metrics

Abstract

The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C. The NPAs it generates consist of a synchronous array of one or more customized processor datapaths, their controller, local memory, and interfaces. The user, or a design space exploration tool that is a part of the full PICO system, identifies within the application a loop nest to be implemented as an NPA, and indicates the performance required of the NPA by specifying the number of processors and the number of machine cycles that each processor uses per iteration of the inner loop. PICO-NPA emits synthesizable HDL that defines the accelerator at the register transfer level (RTL). The system also modifies the user's application software to make use of the generated accelerator.

The main objective of PICO-NPA is to reduce design cost and time, without significantly reducing design quality. Design of an NPA and its support software typically requires one or two weeks using PICO-NPA, which is a many-fold improvement over the industry norm. In addition, PICO-NPA can readily generate a wide-range of implementations with scalable performance from a single specification. In experimental comparison of NPAs of equivalent throughput, PICO-NPA designs are slightly more costly than hand-designed accelerators.

Logic synthesis and place-and-route have been performed successfully on PICO-NPA designs, which have achieved high clock rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

Article Open access 15 February 2023

Application-Specific Processors

References

S. Aditya, B.R. Rau, and V. Kathail, “Automatic Architecture Synthesis of VLIW and EPIC Processors,” in Proceedings of the 12th International Symposium on System Synthesis, San Jose, California, Nov. 1999, pp. 107-113.
R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tijang, S.-W. Liao, C.-W. Tseng, M. Hall, M. Lam, and J. Hennessy, “SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers,” ACM Sigplan Notices, vol. 29, 1994, pp. 31-37.
Article Google Scholar
B.R. Rau, V. Kathail, and S. Aditya, “Machine-Description Driven Compilers for EPIC and VLIW Processors,” Design Automation for Embedded Systems, vol. 4, 1999, pp. 71-118.
Article Google Scholar
The Trimaran Compiler Infrastructure for Instruction-Level Parallelism. www.trimaran.org.
W. Pugh, “The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis,” Communications of the ACM, vol. 35,no. 8, 1992, pp. 102-114.
Article Google Scholar
D.I. Moldovan and J.A.B. Fortes, “Partitioning and Mapping of Algorithms into Fixed Size Systolic Arrays,” IEEE Transactions on Computers, vol. 35, 1986, pp. 1-12.
Article MATH Google Scholar
K. Gallivan, W. Jalby, and D. Gannon, “On the Problem of Optimizing Data Transfers for Complex Memory Systems,” in Proceedings of the 1988 ACM International Conference on Supercomputing, 1988, pp. 238-253.
F. Irigoin and R. Triolet, “Supernode Partitioning,” in Proceedings of the Fifteenth Annual ACM SIGACT/SIGPLAN Symposium on Principles of Programming Languages, 1988, pp. 319-329.
A. Darte, R. Schreiber, B.R. Rau, and F. Vivien, “Constructing and Exploiting Linear Schedules with Prescribed Parallelism,” ACM Transactions on Design Automation for Electronic Systems, vol. 7,no. 1, 2002, pp. 1-14.
Article Google Scholar
M. Weinhardt and W. Luk, “Memory Access Optimization and RAM Inference for Pipeline Vectorization,” in Field Programmable Logic and Applications, Proceedings of the 9th International Workshop, FPL '99, vol. 1673 of Lecture Notes in Computer Science, New York: Springer-Verlag, 1999, pp. 61-70.
Chapter Google Scholar
T. Risset, The Alpha homepage. http://www.irisa.fr/cosi/ALPHA/welcome.html.
V. van Dongen and P. Quinton, “Uniformization of Linear Recurrence Equations: A Step Towards the Automatic Synthesis of Systolic Arrays,” in Proceedings of the International Conference on Systolic Arrays, San Diego, California, IEEE Computer Society Press, 1988, pp. 473-481.
Chapter Google Scholar
V.P. Roychowdhury, L. Thiele, S. Rao, and T. Kailath, “On the Localization of Algorithms for VLSI Processor Arrays,” in IEEE Workshop on VLSI Signal Processing, IEEE, 1989.
M.C. Chen, “A Design Methodology for Synthesizing Parallel Algorithms and Architectures,” Journal of Parallel and Distributed Computing, 1986, pp. 461-491.
S.V. Rajopadhye, “Synthesizing Systolic Arrays with Control Signals from Recurrence Equations,” Distributed Computing, vol. 3, 1989, pp. 88-105.
Article Google Scholar
S. Mahlke, R. Ravindran, M. Schlansker, R. Schreiber, and T. Sherwood. “Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators.” IEEE Transactions on Computer-Aided Design of Circuits and Systems, vol. 20,no. 11, 2001, pp. 1355-1371.
Article Google Scholar
B.R. Rau, “Iterative Modulo Scheduling,” International Journal of Parallel Processing, vol. 24, 1996, pp. 3-64.
Google Scholar
V. Kathail, M. Schlansker, and B.R. Rau, “HPL PlayDoh Architecture Specification: Version 1.0,” Technical Report HPL-93-80, Hewlett Packard Laboratories, Feb. 1994.
P. Quinton and Y. Robert, Systolic Algorithms and Architectures, Hemel Hempstead, England: Prentice Hall International (UK) Ltd., 1991.
Google Scholar
D. Wilde and O. Sie, “Regular Array Synthesis Using ALPHA,” in Proceedings, Application-Specific Systems, Architectures and Processors Conference, San Francisco, IEEE, June 1994.
Google Scholar
P. Held, P. Dewilde, E. Deprettere, and P. Wielage, “HIFI: From Parallel Algorithm to Fixed-Size VLSI Processor Array,” in Application-Driven Architecture Synthesis, Dordrecht: Kluwer Academic Publishers, 1993, pp. 71-94.
Chapter Google Scholar
H. De Man, J. Rabaey, P. Six, and L. Claesen, “CATHEDRAL-II: A Silicon Compiler for Digital Signal Processing Multiprocessor vlsi Systems,” Design & Test of Computers, 1986, pp. 13-25.

Download references

Author information

Authors and Affiliations

Hewlett-Packard Laboratories, Palo Alto, California, 94304-1126, USA
Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B. Ramakrishna Rau, Darren Cronquist & Mukund Sivaraman

Authors

Robert Schreiber
View author publications
You can also search for this author in PubMed Google Scholar
Shail Aditya
View author publications
You can also search for this author in PubMed Google Scholar
Scott Mahlke
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Kathail
View author publications
You can also search for this author in PubMed Google Scholar
B. Ramakrishna Rau
View author publications
You can also search for this author in PubMed Google Scholar
Darren Cronquist
View author publications
You can also search for this author in PubMed Google Scholar
Mukund Sivaraman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schreiber, R., Aditya, S., Mahlke, S. et al. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 31, 127–142 (2002). https://doi.org/10.1023/A:1015341305426

Download citation

Published: 01 June 2002
Issue Date: June 2002
DOI: https://doi.org/10.1023/A:1015341305426

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Abstract

Access this article

Similar content being viewed by others

AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

Application-Specific Processors

Application-Specific Processors

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Abstract

Access this article

Similar content being viewed by others

AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

Application-Specific Processors

Application-Specific Processors

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation