Abstract:
Coarse-grained reconfigurable arrays (CGRAs) can be dynamically programmed by configuration contexts to concurrently run multiple operations on a processing elements arra...Show MoreMetadata
Abstract:
Coarse-grained reconfigurable arrays (CGRAs) can be dynamically programmed by configuration contexts to concurrently run multiple operations on a processing elements array. This further widens the gap between off-chip memory bandwidth demand and the limited speed of off-chip memory access. Cache prefetching is widely used for mitigating off-chip memory latency. However, straightforwardly applying existing prefetching techniques (primarily focusing on instruction driven processors) to CGRA may induce inaccurate prefetching, thereby crippling CGRA performance. Based on repetitively executed context in CGRA computing, this paper proposes a context-directed pattern matching (CDPM) mechanism to improve prefetching accuracy for CGRAs. CDPM generates a prefetch pattern for an initially executed context, and then reuses the pattern to issue prefetch requests when the context is re-executed. In order to eliminate the outdated prefetch pattern, CDPM also evaluates the prefetching accuracy of the prefetch pattern at run-time by adding prefetch addresses to a Bloom filter. The distinguishing feature of CDPM is the employment of the CGRA configuration context as a guide to improving prefetching accuracy. Experimental results showed that CDPM prefetching averagely improved performance by 31.1% compared to tests without prefetching and by 7.7% compared to state-of-the-art cache prefetching techniques, while only incurring slight area and power overheads.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 37, Issue: 6, June 2018)