ABSTRACT
CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect providing single-cycle communications between distant FUs, resulting in a new formulation of the application mapping problem that leads to the design of an efficient compiler. HyCUBE achieves 1.5X and 3X better performance-per-watt compared to a CGRA with standard NoC and a CGRA with neighbor-to-neighbor connectivity, respectively.
- Arm cortex-a5. https://goo.gl/pGytB2.Google Scholar
- Bouwens et al. Architectural exploration of the adres coarse-grained reconfigurable array. In ARC '07. Google ScholarDigital Library
- Chen et al. Algorithmic optimizations for energy efficient throughput-oriented fft architectures on fpga. In IGCC '14.Google Scholar
- L. Chen et al. Graph minor approach for application mapping on cgras. TRETS '14. Google ScholarDigital Library
- B. De Sutter et al. Coarse-grained reconfigurable array architectures. In Handbook of signal processing systems. '13.Google Scholar
- M. R. Guthaus et al. Mibench. In WWC-4 '01.Google Scholar
- M. Hamzeh et al. Epimap: using epimorphism to map applications on cgras. In DAC '12. Google ScholarDigital Library
- M. Hamzeh et al. Regimap: register-aware application mapping on coarse-grained reconfigurable architectures. In DAC '13. Google ScholarDigital Library
- Kim et al. Ulp-srp: Ultra low power samsung reconfigurable processor for biomedical applications. In FPT '12.Google Scholar
- T. Krishna et al. Breaking the on-chip latency barrier using smart. In HPCA'13. Google ScholarDigital Library
- C. Lattner et al. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04. Google ScholarDigital Library
- B. Mei et al. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In FPL '03.Google Scholar
- B. Mei et al. Dresc: A retargetable compiler for coarse-grained reconfigurable architectures. In FPT '02.Google Scholar
- B. Mei et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE-Computers and Digital Techniques '03.Google Scholar
- Park et al. Efficient performance scaling of future cgras for mobile applications. In FPT '12.Google Scholar
- H. Park et al. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO '09. Google ScholarDigital Library
- B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO '94. Google ScholarDigital Library
- S. Thomas et al. Cortexsuite. In IISWC '14.Google Scholar
Recommendations
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Comments