ABSTRACT
The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.
- Apple Computer, Inc., "Optimizing with SHARK, Big Payoff, Small Effort," http://developer.apple.com/tools/shark_optimize.html.Google Scholar
- D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, G. Stitt, "Profiling Tools for Hardware/Software Partitioning of Embedded Applications", Proc. Of the 2003 ACM SiGPLAN Conf. On Languages, Compilers and Tools for Embedded Systems, San Diego, CA June 2003. Google ScholarDigital Library
- P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Chang, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, "MATCH: A MATLAB Compilation Environment for Configurable Computing Systems," International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, 2000.Google Scholar
- S. Gupta, N. Savoiu, N. D. Dutt, R. K. Gupta, A. Nicolau, "Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis," IEEE Transactions on Computer Aided Design, February, 2004. Google ScholarDigital Library
- A. K. Jones, D. Bagchi, S. Pal, P. Banerjee, and A. Choudhary, Pact HDL: Compiler Targeting ASIC's and FPGA's with Power and Performance Optimizations, Chapter 9 in Power Aware Computing, ed. by Robert Graybill and Rami Melhem, pp. 169--190. Kluwer Academic Publishers, Boston, MA, 2002. Google ScholarDigital Library
- X. Tang, T. Jiang, A. K. Jones, and P. Banerjee, "Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation," in Proceedings of the IEEE International Conference on VLSI Design, January 2005. Google ScholarDigital Library
- Synopsys, Inc., "Behavioral Compiler," http://www.synopsys.com.Google Scholar
- V.A. Chouliaras and J. Nunez, "Scalar Coprocessors for Accelerating the G723.1 and G729A Speech Coders," IEEE Transactions on Consumer Electronics, Vol. 69 No. 3, August 2003, pp. 703--710. Google ScholarDigital Library
- E. Atzori, S.M. Carta and L. Raffo, "44.6% Processing Cycles Reduction in GSM Voice by Low-power Reconfigurable Co-processor Architecture," Eletronics Letters, Vol. 38 No. 24, November 2002, pp. 1524--1526.Google ScholarCross Ref
- J. Hilgenstock, K. Herrmann, J. Otterstedt, D. Niggemeyer and P. Pirsch, "A Video Signal Processor for MIMD Multiprocessing," Proceedings of the 1998 Design Automation Conference, San Francisco, CA, June 1998. Google ScholarDigital Library
- R. Garg, C.Y. Chung, D. Kim and Y. Kim, "Boundary Macroblock Padding in MPEG-4 Video Decoding Using a Graphics Co-processor," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12 No. 8, August 2002, pp. 719--723. Google ScholarDigital Library
- C.N. Hinds, "An Enhanced Floating Point Coprocessor for Embedded Signal Processing and Graphics Applications," Conference Record of the 33rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 1999.Google Scholar
- J.C. Alves and J.S. Matos, "RVC-A Reconfigurable Coprocessor for Vector Processing Applications," Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, CA, April 1998. Google ScholarDigital Library
- T. Bridges, S.W. Kitchel and R. M. Wehrmeister, "A CPU Utilization Limit for Massively Parallel MIMD Computers," Fourth Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.Google Scholar
- S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. Taylor, "PipeRench: A Reconfigurable Architecture and Compiler" in IEEE Computer, Vol.33, No. 4, April 2000. Google ScholarDigital Library
- B. A. Levine, H. Schmit, "Efficient Application Representation for HASTE: Hybrid Architectures with a Single, Transformable Executable." FCCM 2003. Google ScholarDigital Library
- C. Ebeling, D. C. Cronquist, P. Franklin, "RaPiD - Reconfigurable Pipelined Datapath", in the 6th International Workshop on Field-Programmable Logic and Applications, 1996. Google ScholarDigital Library
- E. Mirsky and A. DeHon," MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources", in Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, April 1996.Google ScholarCross Ref
- B.Khailany et al., "Imagine: media processing with streams", Micro, March-April 2001. Google ScholarDigital Library
- T.J. Callahan, J.R. Hauser and J. Wawrzynek, "The Garp architecture and C compiler," Computer, Volume: 33, Issue: 4, April 2000. Google ScholarDigital Library
- M. Sima, S. Cotofana, J. T. J. van Eijndhoven, S. Vassilidis, and K. Vissers, "An 8 x 8 IDCT Implementation on an FPGA-Augmented TriMedia," Field Programmable Custom Computing Machines (FCCM) 2001. Google ScholarDigital Library
- S. Hauck, T. W. Fry, M. M. Hosler, J. P. Kao, "The Chimaera Reconfigurable Functional Unit," IEEE Symposium on FPGAs for Custom Computing Machines, pp. 87--96, 1997. Google ScholarDigital Library
- S. Dutta, A. Wolfe, W. Wolf and K. O'Connor, "Design Issues for Very-Long-Instruction-Word VLSI Video Signal Processors," IEEE Workshop on VLSI Signal Processing, San Francisco, October 1996.Google Scholar
- R. Hoare, S. Tung, K. Werger, "A 64-Way SIMD Processing Architecture on an FPGA," in Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems, 2003, pp. 345--350.Google Scholar
- A. Jones, R. Hoare, I. Kourtev, J. Fazekas, D. Kusic, J. Foster, S. Boddie, A. Muaydh, "A 64-way VLIW/SIMD FPGA Processing Architecture and Design Flow," in Proc. of ICECS, 2004.Google Scholar
- Advanced RISC Machines, "ARM7TDMI Processor," http://www.arm.com/products/CPUs/ARM7TDMI.html.Google Scholar
- Altera Corporation, "NIOS II Soft-core Processor," http://www.altera.com/products/ip/processors/nios2/cores/ni2-processor_cores.html.Google Scholar
- Xilinx Corporation, "Microblaze Soft-core Processor," http://www.xilinx.com/ipcenter/processor_central/microblaze/performance.htm.Google Scholar
- International Business Machines (IBM), "Power-PC 405 Embedded CPU," http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_405_Embedded_Cores.Google Scholar
- D. Rizzo and O. Colavin, "A Video Compression case Study on a reconfigurable VLIW Architecture," Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Paris, France, March 2002. Google ScholarDigital Library
- "Trimaran, An Infrastructure for Research in Instruction Level Parallelism", 1998. http://www.trimaran.org.Google Scholar
Index Terms
- An FPGA-based VLIW processor with custom hardware execution
Recommendations
A time-predictable VLIW processor and its compiler support
Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation, and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving ...
A design of EPIC type processor based on MIPS architecture
AbstractThis paper proposes an EPIC (Explicitly Parallel Instruction Computing Architecture) type processor based on MIPS. VLIW processors can execute multiple instructions simultaneously, but due to dependency of instructions, it is often impossible to ...
The microarchitecture of FPGA-based soft processors
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systemsAs more embedded systems are built using FPGA platforms, there is an increasing need to support processors in FPGAs. One option is the soft processor, a programmable instruction processor implemented in the reconfigurable logic of the FPGA. Commercial ...
Comments