Abstract
Hardware accelerators, used as application-specific extensions to the computational capabilities of a system, are efficient mechanisms to enhance the performance and reduce the power dissipation in a System On Chip (SoC). These accelerators execute on the computationally critical part of the application, and offload computations from the scalar processors. In this paper, we present a design automation tool that generates accelerators based on a given application kernel. The accelerators are processing streaming data, and support a programming model which can naturally express a large number of embedded applications, and which results in efficient and fast hardware implementations. We demonstrate the applicability of the tool for architectural space exploration for a number of media applications, with results on area, throughput, and clock speeds.
- Amarasinghe S., Thies B. Architectures, Languages and Compilers for the Streaming Domain. Tutorial at the 12th Annual International Conference on Parallel Architectures and Compilation Techniques, New Orleans, LAGoogle Scholar
- Babb J., et. al. Parallelizing Applications into Silicon. Proceedings of the 7th IEEE Symposium on Field Custom Computing Machines (FCCM), April 1999, Napa Valley, CA Google ScholarDigital Library
- Banerjee P. et. al. A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. Proceedings of the IEEE Symposium on Field Custom Computing Machines (FCCM), April 17-19, 2000, pp. 39--48, Napa Valley, CA Google ScholarDigital Library
- Callahan T., Hauser J., Wawrzynek J. The Garp Architecture and C Compiler. IEEE Computer Magazine, vol. 33, no. 4, April 2000, pp. 62--69 Google ScholarDigital Library
- Caspi E., Huang R., Yeh J., Markovskiy Y., DeHon A., Wawrzynek J. Stream Computations organized for Reconfigurable Execution (SCORE): Introduction and Tutorial. BRASS research group technical report, University of California, Berkeley, August 2000Google Scholar
- Chirisescu S., et. al. The Reconfigurable Streaming Vector Processor, RSVP#8482;. Proceedings of the 36th International Conference on Microarchitecture, December 2003, pp. 141--150, San Diego, CA Google ScholarDigital Library
- Clark N., Zhong H., and Mahlke S. Processor Acceleration Through Automated Instruction Set Customization. Proceedings of the 36th International Symposium on Microarchitecture, December 3-5, 2003, pp. 129--140, San Diego, CA Google ScholarDigital Library
- Compton K., Hauck S. Reconfigurable Computing: A Survey of Systems and Software. ACM Computing Surveys, vol. 34, No. 2, June 2002, pp. 171--210 Google ScholarDigital Library
- Dally W. J., Hanrahan P., Erez M., Knight T. J., Labonté F., Ahn J. H., Jayasena N., Kapasi U. J., Das A., Gummaraju J., Buck, I. Merrimac: Supercomputing with Streams. Proceedings of the 2003 Supercomputing Conference, November 2003, pp-35--42, Phoenix, AZ Google ScholarDigital Library
- De Micheli G. Hardware Synthesis from C/C++ models. Proceedings of the conference on Design, Automation and Test in Europe (DATE), March 1999, pp. 382--383, Munich, Germany Google ScholarDigital Library
- Ebeling C., Cronquist D., Franklin P., Secosky J., Berg S. Mapping Applications to the RaPiD configurable architecture. Proceedings of the 5th IEEE Symposium on Field Custom Computing Machines (FCCM), April 16-18, 1997, pp. 106--115, Napa Valley, CA Google ScholarDigital Library
- Gokhale M., Stone J., Arnold J., Kalinowski M. Stream-Oriented FPGA computing in the Streams-C High Level Language. Proceedings of the 8th IEEE Symposium on Field Custom Computing Machines (FCCM), April 17-19, 2000, pp. 39--48, Napa Valley, CA Google ScholarDigital Library
- Goldstein S. C. et. al. PipeRench: A Reconfigurable Architecture and Compiler. IEEE Computer Magazine, vol. 33, no. 4 April 2000, pp. 70--77 Google ScholarDigital Library
- Gonzalez R. Software Configurable Processors Change System Design. Hot Chips XVII, August 15-16, 2005, Palo Alto, CAGoogle ScholarCross Ref
- Gottlieb D. B., Cook J. J., Walstrom J. D., Ferrera S, Wang C. W., Carter N. P. Clustered Programmable-Reconfigurable Processors. Proceedings of the 1st IEEE International Conference on Field Programmable Technology (FPT), December 2002.Google Scholar
- Gschwind M., Hofstee P., Flachs B., Hopkins M., Watanabe Y., Yamazaki T. A novel SIMD architecture for the Cell heterogeneous chip-multiprocessors. Hot Chips XVII, August 15-16, 2005, Palo Alto, CAGoogle ScholarCross Ref
- Hwang C. T., Hsu Y. S., Lin Y. L. PLS: A Scheduler for Pipeline Synthesis. IEEE Transactions of Integrated Circuits and Systems, vol. 12, no. 9, September 1993, pp. 1279--1286Google ScholarDigital Library
- Kathail V., Aditya S., Schreiber R., Rau B. R., Cronquist D., Sivaraman M. PICO: Automatically Designing Custom Computers. IEEE Computer Magazine, vol. 35, no. 9, September 2002, pp. 39--47 Google ScholarDigital Library
- Kulkarni., Brebner G., Schelle G. Mapping a Domain Specific Language to a Platform FPGA. Proceedings of the 41st Design Automation Conference (DAC), pp.924--927, San Diego, CA Google ScholarDigital Library
- Lee J., Haralick R., Shapiro L. Morphological Edge Detection. IEEE Journal of Robotics and Automation, vol. 3, issue 2, April 1987Google ScholarCross Ref
- Mencer O., Pierce D. J., Howes L. W., Luk W. Design Space Exploration with a Stream Compiler. Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), December 2003, Tokyo, JapanGoogle ScholarCross Ref
- Pellerin D., Thibault S. Practical FPGA Programming in C. Prentice Hall, 2005 Google ScholarDigital Library
- Rau B. R. Iterative Modulo Scheduling. International Journal of Parallel Processing, 24:3--64, 1996Google ScholarDigital Library
- Rinker R., Carter M., Patel A., Chawathe M., Ross C., Hammes J., Najjar W., Bohm W. An Automated Process for Compiling Dataflow Graphs into Reconfigurable Hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 9, no. 1, February 2001, pp. 130--139 Google ScholarDigital Library
- Schreiber R., Aditya S., Mahlke S., Kathail V., Rau B. R., Cronquist D., Sivaraman M. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators. HP Labs Technical Report HPL-2001-249, October 5th 2001Google Scholar
- Schaumont P., Vernalde S., Rijnders L., Engels M., Bolsen I. A programming environment for the design of complex high speed ASICs. Proceedings of the 35th Design Automation Conference (DAC), June 1998, pp. 315--320, San Francisco, CA Google ScholarDigital Library
- Taylor M. B., et. al. The RAW Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro Magazine, 22(2), March 2002, pp.25--35 Google ScholarDigital Library
- Wakabayashi K. and Okamoto T. C-based SoC design flow and EDA tools: An ASIC and system vendor perspective. IEEE Transactions on Computer-Aided Design, 19(12):1507--1522, December 2000 Google ScholarDigital Library
- Wulf W. A., McKee S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, Vol. 23, no. 1, March 1995, pp. 20--24. Google ScholarDigital Library
- Ye A. Z., Moshovos A., Hauck S., Banerjee P. CHIMAERA: A high-performance architecture with a tightly-coupled reconfigurable unit. Proceedings of the 27th International Symposium on Computer Architecture (ISCA), June 2000, pp. 225--235, Vancouver, BC. Google ScholarDigital Library
- H. Ziegler H., Hall M. Evaluating Heuristics in Automatically Mapping Multi-Loop Applications to FPGAs Proceedings of the 13th International Symposium on FPGAs, February 2005, pp. 184--195, Monterey, CA Google ScholarDigital Library
- Celoxica Corporation, Handel-C language reference manual, www.celoxica.comGoogle Scholar
- Automated Configurable Processor Design Flow, White Paper, www.tensilica.comGoogle Scholar
- Virtex-4 FPGA handbook, www.xilinx.com, August 2004Google Scholar
Index Terms
- Mapping streaming architectures on reconfigurable platforms
Recommendations
Pipeline Reconfigurable DSP for Dynamically Reconfigurable Architectures
Dynamically reconfigurable architectures, such as NATURE, achieve high logic density and low reconfiguration latency compared to traditional field-programmable gate arrays. Unlike fine-grained NATURE, reconfigurable DSP block incorporated NATURE ...
Mapping of option pricing algorithms onto heterogeneous many-core architectures
The rapid development of technologies and applications in recent years poses high demands and challenges for high-performance computing. Because of their competitive performance/price ratio, heterogeneous many-core architectures are widely used in high-...
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms
Special Section on FPL 2013One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and ...
Comments