skip to main content
10.1145/1046192.1046207acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

An FPGA-based VLIW processor with custom hardware execution

Published:20 February 2005Publication History

ABSTRACT

The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.

References

  1. Apple Computer, Inc., "Optimizing with SHARK, Big Payoff, Small Effort," http://developer.apple.com/tools/shark_optimize.html.Google ScholarGoogle Scholar
  2. D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, G. Stitt, "Profiling Tools for Hardware/Software Partitioning of Embedded Applications", Proc. Of the 2003 ACM SiGPLAN Conf. On Languages, Compilers and Tools for Embedded Systems, San Diego, CA June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Chang, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, "MATCH: A MATLAB Compilation Environment for Configurable Computing Systems," International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, 2000.Google ScholarGoogle Scholar
  4. S. Gupta, N. Savoiu, N. D. Dutt, R. K. Gupta, A. Nicolau, "Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis," IEEE Transactions on Computer Aided Design, February, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. K. Jones, D. Bagchi, S. Pal, P. Banerjee, and A. Choudhary, Pact HDL: Compiler Targeting ASIC's and FPGA's with Power and Performance Optimizations, Chapter 9 in Power Aware Computing, ed. by Robert Graybill and Rami Melhem, pp. 169--190. Kluwer Academic Publishers, Boston, MA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. X. Tang, T. Jiang, A. K. Jones, and P. Banerjee, "Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation," in Proceedings of the IEEE International Conference on VLSI Design, January 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Synopsys, Inc., "Behavioral Compiler," http://www.synopsys.com.Google ScholarGoogle Scholar
  8. V.A. Chouliaras and J. Nunez, "Scalar Coprocessors for Accelerating the G723.1 and G729A Speech Coders," IEEE Transactions on Consumer Electronics, Vol. 69 No. 3, August 2003, pp. 703--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Atzori, S.M. Carta and L. Raffo, "44.6% Processing Cycles Reduction in GSM Voice by Low-power Reconfigurable Co-processor Architecture," Eletronics Letters, Vol. 38 No. 24, November 2002, pp. 1524--1526.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Hilgenstock, K. Herrmann, J. Otterstedt, D. Niggemeyer and P. Pirsch, "A Video Signal Processor for MIMD Multiprocessing," Proceedings of the 1998 Design Automation Conference, San Francisco, CA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Garg, C.Y. Chung, D. Kim and Y. Kim, "Boundary Macroblock Padding in MPEG-4 Video Decoding Using a Graphics Co-processor," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12 No. 8, August 2002, pp. 719--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C.N. Hinds, "An Enhanced Floating Point Coprocessor for Embedded Signal Processing and Graphics Applications," Conference Record of the 33rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 1999.Google ScholarGoogle Scholar
  13. J.C. Alves and J.S. Matos, "RVC-A Reconfigurable Coprocessor for Vector Processing Applications," Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, CA, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Bridges, S.W. Kitchel and R. M. Wehrmeister, "A CPU Utilization Limit for Massively Parallel MIMD Computers," Fourth Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.Google ScholarGoogle Scholar
  15. S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. Taylor, "PipeRench: A Reconfigurable Architecture and Compiler" in IEEE Computer, Vol.33, No. 4, April 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. A. Levine, H. Schmit, "Efficient Application Representation for HASTE: Hybrid Architectures with a Single, Transformable Executable." FCCM 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Ebeling, D. C. Cronquist, P. Franklin, "RaPiD - Reconfigurable Pipelined Datapath", in the 6th International Workshop on Field-Programmable Logic and Applications, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Mirsky and A. DeHon," MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources", in Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, April 1996.Google ScholarGoogle ScholarCross RefCross Ref
  19. B.Khailany et al., "Imagine: media processing with streams", Micro, March-April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T.J. Callahan, J.R. Hauser and J. Wawrzynek, "The Garp architecture and C compiler," Computer, Volume: 33, Issue: 4, April 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Sima, S. Cotofana, J. T. J. van Eijndhoven, S. Vassilidis, and K. Vissers, "An 8 x 8 IDCT Implementation on an FPGA-Augmented TriMedia," Field Programmable Custom Computing Machines (FCCM) 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Hauck, T. W. Fry, M. M. Hosler, J. P. Kao, "The Chimaera Reconfigurable Functional Unit," IEEE Symposium on FPGAs for Custom Computing Machines, pp. 87--96, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Dutta, A. Wolfe, W. Wolf and K. O'Connor, "Design Issues for Very-Long-Instruction-Word VLSI Video Signal Processors," IEEE Workshop on VLSI Signal Processing, San Francisco, October 1996.Google ScholarGoogle Scholar
  24. R. Hoare, S. Tung, K. Werger, "A 64-Way SIMD Processing Architecture on an FPGA," in Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems, 2003, pp. 345--350.Google ScholarGoogle Scholar
  25. A. Jones, R. Hoare, I. Kourtev, J. Fazekas, D. Kusic, J. Foster, S. Boddie, A. Muaydh, "A 64-way VLIW/SIMD FPGA Processing Architecture and Design Flow," in Proc. of ICECS, 2004.Google ScholarGoogle Scholar
  26. Advanced RISC Machines, "ARM7TDMI Processor," http://www.arm.com/products/CPUs/ARM7TDMI.html.Google ScholarGoogle Scholar
  27. Altera Corporation, "NIOS II Soft-core Processor," http://www.altera.com/products/ip/processors/nios2/cores/ni2-processor_cores.html.Google ScholarGoogle Scholar
  28. Xilinx Corporation, "Microblaze Soft-core Processor," http://www.xilinx.com/ipcenter/processor_central/microblaze/performance.htm.Google ScholarGoogle Scholar
  29. International Business Machines (IBM), "Power-PC 405 Embedded CPU," http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_405_Embedded_Cores.Google ScholarGoogle Scholar
  30. D. Rizzo and O. Colavin, "A Video Compression case Study on a reconfigurable VLIW Architecture," Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Paris, France, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. "Trimaran, An Infrastructure for Research in Instruction Level Parallelism", 1998. http://www.trimaran.org.Google ScholarGoogle Scholar

Index Terms

  1. An FPGA-based VLIW processor with custom hardware execution

            Recommendations

            Reviews

            Vassilios A. Chouliaras

            This is a very exciting piece of research in the general area of configurable, extensible processors and the software/hardware interface. The authors propose a hybrid architecture, consisting of a parameterized very long instruction word (VLIW) core augmented with custom hardware execution units, as a very potent programmable execution engine. In addition, they have developed the software infrastructure to allow for automatic optimization of C-based applications. In the introductory section, the authors identify large-capacity field-programmable gate arrays (FPGAs) with substantial computer/memory resources as becoming commonplace. They correctly point out that the efficient mapping of applications on such devices is not a trivial exercise anymore, with a typical use being software kernels allocated on the FPGA fabric, and the irregular (control) part of the application running on an embedded processor. This segregation has indeed been identified by the major FPGA vendors, which utilize embedded processors on their devices to accommodate both regular and irregular codes. The authors provide a good discussion of past and present behavioral synthesis solutions, and correctly identify such solutions as appropriate for combinational code, not for control-dominated applications. In addition, they provide a very good overview of the literature, both from academia and from industry, on configurable (static) and reconfigurable (dynamic) systems for software acceleration. To address large, irregular code pieces in a semi-automatic manner, the authors propose a parametric platform to efficiently exploit all parallelism. The platform is a four-wide VLIW-based processor that is binary-compatible with the Altera NIOS II instruction set architecture (ISA). In addition, it supports extending that ISA with custom hardware resources to achieve superlinear speedups. The software infrastructure is based on the well-known Trimaran VLIW research. The authors use an interesting technique to extract computational kernels (hardware functions), which are implemented directly as hardware blocks. These blocks make use of the abundant MAC units in typical high-performance FPGA devices, such as the Altera Stratix family. The authors discuss their hardware architecture, which is based on a four-wide VLIW with an eight-register, four-word (8R/4W) 32x32-bit register file, shared among the VLIW processing elements (PEs) and the custom hardware units. They also correctly identify the register file as the performance-limiting resource in an FPGA implementation, and provide substantial microarchitecture performance data. In the remaining sections, the authors discuss zero-overhead hardware/software switching, the hardware functions, and the software tool chain. They performed design, validation, and FPGA implementation, and achieved 167 megahertz (MHz) on an Altera Stratix, which is an impressive clock speed for a programmable device. Finally, they report on application speedups for both their standalone VLIW engine and their four-wide VLIW, augmented with hardware functions. Results range from nine percent to 230 times for kernel acceleration, which is indeed impressive. Overall, this is a thorough account of the proposed field of research; the authors did their best to disclose as much information as possible in the context of a conference paper. I was very much impressed with the technical ability of all those involved. This is a solid paper on embedded central processing unit (CPU) architecture. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
              February 2005
              288 pages
              ISBN:1595930299
              DOI:10.1145/1046192

              Copyright © 2005 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 February 2005

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate125of627submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader