skip to main content
10.1145/2897937.2898001acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays

Published: 05 June 2016 Publication History

Abstract

This paper proposes a context directed pattern matching (CDPM) mechanism, which employs the context of the coarse-grained reconfigurable arrays (CGRAs) as a guide to improve cache prefetching accuracy. CDPM generates a prefetch pattern for an initially executed context, and reuses the pattern to issue prefetch requests when the context is again executed on CGRA. To eliminate the outdated prefetch pattern, CDPM also evaluates the prefetching accuracy of the prefetch pattern at run-time. Experiments showed that CDPM averagely improved performance by 31.1% compared to tests without any prefetching and by 7.7% compared to state-of-the-art prefetching techniques.

References

[1]
R. Tessier, et al., "Reconfigurable Computing Architectures", Proceedings of the IEEE, 2015, 103(3), pp. 332--354.
[2]
D. Rossi, et al., "A heterogeneous digital signal processor for dynamically reconfigurable computing", IEEE Journal of Solid-State Circuits, 2010, 45(8): 1615--1626.
[3]
O. Atak, et al., "BilRC: An Execution Triggered Coarse Grained Reconfigurable Architecture", IEEE Transactions on Very Large Scale Integration Systems, 2013, 21(7), pp. 1285--1298.
[4]
L. Liu, et al., "An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications", in Proceedings of CICC, 2013, pp. 1--4.
[5]
W. Hussain, et al., "Design of an accelerator-rich architecture by integrating multiple heterogeneous coarse grain reconfigurable arrays over a network-on-chip", in Proceedings of ASAP, 2014, pp. 131--138.
[6]
S. Srinath, et al., "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers", in Proceedings of HPCA, 2007, pp. 63--74.
[7]
B. Panda, et al., "Introducing Thread Criticality awareness in Prefetcher Aggressiveness Control", in Proceedings of DATE, 2014, pp. 1--6.
[8]
C. J. Wu, et al., "PACMan: prefetch-aware cache management for high performance caching", in Proceedings of MICRO, 2011, pp. 442--453.
[9]
Y. Ishii, et al., "Unified memory optimizing architecture: memory subsystem control with a unified predictor", in Proceedings of ICS, 2012, pp. 267--278.
[10]
B. Panda et al., "Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers", Computer Architecture Letters, 2015, no. 99, pp. 1--4.
[11]
S. H. Pugsley, et.al, "Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers", in Proceedings of HPCA, 2014, pp. 626--637.
[12]
A. Fuchs, et al., "Loop-Aware Memory Prefetching Using Code Block Working Sets", in Proceedings of MICRO, 2014, pp. 533--544.
[13]
J. Zebchuk, et al., "RECAP: A region-based cure for the common cold (cache)", in Proceedings of HPCA, 2013, pp. 83--94.
[14]
N. C. Nachiappan, et al., "Application-aware prefetch prioritization in on-chip networks", in Proceedings of PACT, 2012, pp. 441--442.
[15]
Y. Kim, et al., "High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30(11), pp. 1599--1609.
[16]
E. Schuler, et al., "Dynamic Reconfiguration for Irregular Code Using FNC-PAE Processor Cores", in Proceedings of IPDPSW, 2011, pp. 244--249.
[17]
A. Jaleel, et al., "High performance cache replacement using re-reference interval prediction (RRIP)", in Proceedings of ISCA, 2010, pp. 60--71.
[18]
J. Wang, et al., "Acceleration of control flows on reconfigurable architecture with a composite method", in Proceedings of DAC, 2015, pp. 1--6.
[19]
D. Voitsechov, et al., "Single-graph multiple flows: energy efficient design alternative for GPGPUs", in Proceedings of ISCA, 2014, pp. 205--216.
[20]
R. Maestre, et al., "A framework for reconfigurable computing: task scheduling and context management", IEEE Transactions on Very Large Scale Integration Systems, 2001, 9(6), pp. 858--873.
[21]
J. K. Peir, et al., "Bloom filtering cache misses for accurate data speculation and prefetching", in Proceedings of ICS, 2002, pp. 189--198.
[22]
L. Liu, et al., "SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration", IEEE Transactions on Very Large Scale Integration Systems, 2014, 22(12), pp. 2635--2648.
[23]
K. Asanovic, et al., "The Landscape of Parallel Computing Research: A View from Berkeley", Technical Report No. UCB/EECS-2006-183, EECS Dept., UC Berkeley, 2006.

Cited By

View all
  • (2018)DnestmapProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196027(1-6)Online publication date: 24-Jun-2018
  • (2018)HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs ProcessingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2017.272881465:3(381-385)Online publication date: Mar-2018
  • (2018)DNestMap: Mapping Deeply-Nested Loops on Ultra-Low Power CGRAs2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465833(1-6)Online publication date: 24-Jun-2018
  • Show More Cited By

Index Terms

  1. Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    DAC '16: Proceedings of the 53rd Annual Design Automation Conference
    June 2016
    1048 pages
    ISBN:9781450342360
    DOI:10.1145/2897937
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithm
    2. cache prefetching
    3. coarse-grained reconfigurable array (CGRA)
    4. context directed pattern matching (CDPM)

    Qualifiers

    • Research-article

    Conference

    DAC '16

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)DnestmapProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196027(1-6)Online publication date: 24-Jun-2018
    • (2018)HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs ProcessingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2017.272881465:3(381-385)Online publication date: Mar-2018
    • (2018)DNestMap: Mapping Deeply-Nested Loops on Ultra-Low Power CGRAs2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465833(1-6)Online publication date: 24-Jun-2018
    • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
    • (2018)Lookahead Memory Prefetching for CGRAs Using Partial Loop UnrollingApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-319-78890-6_8(93-104)Online publication date: 8-Apr-2018
    • (2017)Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.268224128:9(2471-2485)Online publication date: 1-Sep-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media