skip to main content
10.1145/2024724.2024937acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

A reuse-aware prefetching scheme for scratchpad memory

Published: 05 June 2011 Publication History

Abstract

Scratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements are 15.9% and 22.0% over cache prefetching, 12.9% and 31.2% over prefetch-only SPM management, 18.5% and 10% over DRDU [1] with SPM prefetching support.

References

[1]
I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt, "DRDU: A Data Reuse Analysis Technique for Efficient Scratch-Pad Memory Management," in ACM Trans. Des. Autom. Electron. Syst., 2007.
[2]
T. Chen, T. Zhang, Z. Sura, and M. Tallada, "Prefetching Irregular References for Software Cache on Cell," in Proc. CGO, 2008, pp. 155--164.
[3]
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel, "Scratchpad Memory: A Design Alternative for Cache On-chip Memory in Embedded Systems," in Proc. CODES, 2002, pp. 73--78.
[4]
J. Sjodin and C. Platen, "Storage Allocation for Embedded Processors," in Proc. CASES, 2001, pp. 15--23.
[5]
O. Avissar, R. Barua, and D. Stewart, "An Optimal Memory Allocation Scheme for Scratchpad-based Embedded Systems," in ACM TRANS. Embed. Comput. Syst., 2002, pp. 6--26.
[6]
M. Verma, S. Steinke, and P. Marwedel, "Data Partitioning for Maximal Scratchpad Usage," in Proc. ASPDAC, 2003, pp. 77--83.
[7]
M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, "Dynamic Management of Scratchpad Memory Space," in Proc. DAC, 2001, pp. 690--695.
[8]
S. Udayakumaran and R. Barua, "Compiler-decided Dynamic Memory Allocation for Scratchpad Based Embedded Systems," in Proc. CASES, 2003, pp. 276--286.
[9]
L. Li, H. Feng, and J. Xue, "Compiler-directed Scratchpad Memory Management via Graph Coloring," in ACM Trans. Archit. Code Optim., 2009, pp. 1--17.
[10]
T. Yemliha, S. Srikantaiah, M. Kandemir, and O. Ozturk, "SPM Management Using Markov Chain Based Data Access Prediction," in Proc. ICCAD, 2008, pp. 565--569.
[11]
A. Beric, R. Sethuraman, H. Peters, G. Veldman, J. Meerbergen, and G. Haan, "Streaming Scratchpad Memory Organization for Video Applications," in Proc. Circuits, Signals and Systems, 2004, pp. 427--432.
[12]
T. Mowry, M. Lam, and A. Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching," in Proc. ASPLOS, 1992, pp. 62--73.
[13]
S. Vanderwiel and D. Lilja, "Data Prefetch Mechanisms," in ACM Computing Surveys, 2000, pp. 174--199.
[14]
R. M. Rabbah, H. Sandanagobalane, M. Ekapanyapong, and W. Wong, "Compiler Orchestrated Prefetching via Speculation and Predication," in Proc. ASPLOS, 2004, pp. 189--198.
[15]
T. C. Mowry, "Tolerating latency through software-controlled data prefetching," Ph.D. dissertation, Stanford University, 1994.
[16]
K. Kennedy and J. Allen, Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002.
[17]
ITK Software Guide, http://www.itk.org/ItkSoftwareGuide.pdf.
[18]
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Bountev, and P. Sadayappan, "Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories," in Proc. PPoPP, 2008, pp. 1--10.
[19]
M. Kandemir and A. Choudhary, "Compiler-Directed Scratchpad Memory Hierarchy Design and Management," in Proc. DAC, 2002, pp. 628--633.
[20]
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A full system simulation platform," in IEEE Computer, 2002, pp. 50--58.
[21]
M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood, "Multifacet's general execution-driven multiprocessor simulator(GEMS) toolset," in Computer Architecture News, 2005, pp. 92--99.
[22]
Omega Library, http://www.cs.umd.edu/projects/omega.
[23]
S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, and N. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multi-core and Many-core Architectures," in Proc. MICRO, 2009, pp. 469--480.
[24]
Sun Microsystems, "UltraSPARC-II Enhancements: Support for Software Controlled Prefetch," White Paper, 1997.
[25]
B. Egger, S. Kim, C. Jang, J. Lee, S. L. Min, and H. Shin, "Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU," in IEEE Trans. on Computers, vol. 59, no. 8, 2010, pp. 1047--1062.

Cited By

View all
  • (2022)An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproductionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-022-00242-x2022:1Online publication date: 16-May-2022
  • (2022)OverGen: Improving FPGA Usability through Domain-specific Overlay Generation2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00018(35-56)Online publication date: Oct-2022
  • (2020)Overlapping host-to-device copy and computation using hidden unified memoryProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374531(321-335)Online publication date: 19-Feb-2020
  • Show More Cited By

Index Terms

  1. A reuse-aware prefetching scheme for scratchpad memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '11: Proceedings of the 48th Design Automation Conference
    June 2011
    1055 pages
    ISBN:9781450306362
    DOI:10.1145/2024724
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. prefetch
    2. reuse
    3. scratchpad memory

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproductionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-022-00242-x2022:1Online publication date: 16-May-2022
    • (2022)OverGen: Improving FPGA Usability through Domain-specific Overlay Generation2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00018(35-56)Online publication date: Oct-2022
    • (2020)Overlapping host-to-device copy and computation using hidden unified memoryProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374531(321-335)Online publication date: 19-Feb-2020
    • (2019)Static code transformations for thread‐dense memory accesses in GPU computingConcurrency and Computation: Practice and Experience10.1002/cpe.551232:5Online publication date: 18-Oct-2019
    • (2018)ShaVe-ICEACM Transactions on Embedded Computing Systems10.1145/315766717:2(1-25)Online publication date: 5-Feb-2018
    • (2017)Efficient Memory Partitioning for Parallel Data Access in FPGA via Data ReuseIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.264883836:10(1674-1687)Online publication date: Oct-2017
    • (2017)High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?IEEE Access10.1109/ACCESS.2016.26353785(8419-8432)Online publication date: 2017
    • (2016)Characterizing emerging heterogeneous memoryACM SIGPLAN Notices10.1145/3241624.292670251:11(13-23)Online publication date: 14-Jun-2016
    • (2016)Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/293468022:1(1-25)Online publication date: 2-Sep-2016
    • (2016)Characterizing emerging heterogeneous memoryProceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management10.1145/2926697.2926702(13-23)Online publication date: 14-Jun-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media