Abstract:
Coarse-grained reconfigurable array is a very attractive architecture from the viewpoint of performance and flexibility. However, because the performance improvement is a...Show MoreMetadata
Abstract:
Coarse-grained reconfigurable array is a very attractive architecture from the viewpoint of performance and flexibility. However, because the performance improvement is achieved by exploiting parallelism, the architecture is typically poor at handling control flow, which is sequential in nature. There have been many attempts to overcome this problem by using predicated execution techniques; however, they do not support all types of control flow or suffer from performance degradation in doing so. In addition, predicated execution schemes in general require a longer execution time because both the if- and else-paths are always executed. This paper proposes advanced predicated execution techniques that can handle and accelerate all types of control flow with only 2% hardware overhead. These techniques can also be easily extended to general SIMD machines. We implemented these techniques on a coarse-grained reconfigurable array architecture and verified its functionality and effectiveness by accelerating an H.264 deblocking filter, a kernel which is both data- and control-intensive. The results show that the proposed approach achieves up to 43% improvement in execution time compared to speculation by sacrificing 76% code size, and 24% improvement in execution time compared to the previous full predication approach, with a smaller code size.
Date of Conference: 08-10 December 2010
Date Added to IEEE Xplore: 06 January 2011
ISBN Information: