Abstract:
Field Programmable Gate Arrays (FPGAs) can be customized into application-specific architectures to achieve high performance and energy-efficiency. Unfortunately, they ar...Show MoreMetadata
Abstract:
Field Programmable Gate Arrays (FPGAs) can be customized into application-specific architectures to achieve high performance and energy-efficiency. Unfortunately, they are yet to gain significant adoption by application developers due to their low-level programming model. Moreover, to obtain good performance in an FPGA design, one often needs to correctly parallelize computation and balance the computational throughput with the available data access bandwidth. To address the programming model problem, recent efforts have focused on composing applications out of parallel computational patterns, such as map, reduce, zipWith and foreach, and leveraging the properties of these patterns to generate highly parallel hardware modules capable of high performance. In this work, we focus on the problem of further improving the performance and show that we can utilize the knowledge of how data is consumed and produced by these computational patterns in conjunction with the information of the system architecture to automatically parallelize computations across multiple hardware modules. To achieve this, we automatically infer synchronization needs arising due to parallelization and generate a complete system that can obtain high performance for a given application. We evaluate our approach using seven applications from different domains and show that our automatically generated designs achieve performance improvements ranging from 1.8 to 9.4 times.
Date of Conference: 02-04 September 2015
Date Added to IEEE Xplore: 08 October 2015
Electronic ISBN:978-0-9934-2800-5