skip to main content
10.1145/1944862.1944889acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Published: 24 January 2011 Publication History

Abstract

This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets' usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. Simulation results using a full-system simulator demonstrate that CE achieves an average L2 miss rate reduction of 13.6% over a shared NUCA scheme and by as much as 46.7% for the benchmark programs we examined. Furthermore, evaluations showed that CE outperforms related cache designs.

References

[1]
M. Awasthi, K. Sudan, R. Balasubramonian, J. Carter. "Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches," HPCA, Feb. 2009.
[2]
B. M. Beckmann, M. R. Marty, and D. A. Wood. "ASR: Adaptive Selective Replication for CMP Caches," MICRO, Dec. 2006.
[3]
B. M. Beckmann and D. A. Wood. "Managing Wire Delay in Large Chip-Multiprocessor Caches," MICRO, Dec. 2004.
[4]
C. M. Bienia, S. Kumar, J. P. Singh, and K. Li. "The PARSEC Benchmark Suite: Characterization and Architectural Implications," PACT, Oct. 2008.
[5]
J. Chang and G. S. Sohi. "Cooperative Caching for Chip Multiprocessors," ISCA, June 2006.
[6]
M. Chaudhuri. "PageNUCA: Selected Policies for Page-grain Locality Management in Large Shared Chip-multiprocessor Caches," HPCA, Feb. 2009.
[7]
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing Replication, Communication, and Capacity Allocation in CMPs," ISCA, June 2005.
[8]
S. Cho and L. Jin "Managing Distributed Shared L2 Caches through OS-Level Page Allocation," MICRO, Dec 2006.
[9]
Z. Guz, I. Keidar, A. Kolodny, U. C. Weiser. "Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture," SPAA, June 2008.
[10]
M. Hammoud, S. Cho, and R. Melhem. "A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors," Computer Architecture Letters, May 2010.
[11]
M. Hammoud, S. Cho, and R. Melhem. "ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors," HiPEAC, Jan. 2009.
[12]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. "Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches," ISCA, June 2009.
[13]
HP Labs. "http://www.hpl.hp.com/research/cacti/"
[14]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. "A NUCA Substrate for Flexible CMP Cache Sharing," ICS, June 2005.
[15]
L. Jin and S. Cho. "Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches," ICPP, September 2008.
[16]
N. P. Jouppi. "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," ISCA, 1990.
[17]
M. Kandemir, F. Li, M. J. Irwin, and S. W. Son. "A Novel Migration-Based NUCA Design for Chip Multiprocessors," Proc. HiPC, Nov. 2008.
[18]
C. Kim, D. Burger, and S. W. Keckler. "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," ASPLOS, Oct. 2002.
[19]
P. Kongetira, K. Aingaran, and K. Olukotun. "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, March--April 2005.
[20]
G. Memik, G. Reinman, and W. H. Mangione-Smith. "Reducing Energy and Delay Using Efficient Victim Caches," ISLPED, 2003.
[21]
K. Olukotun, L. Hammond, and J. Laudon. "Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency," Synthesis Lectures on Computer Arch, 1st Ed., Morgan and Claypool, Dec. 2007.
[22]
M. K. Qureshi. "Adaptive Spill-Receive for Robust High-Performance Caching in CMPs," HPCA, Feb. 2009.
[23]
Research at Intel. "Introducing the 45nm Next-Generation Intel Core#8482; Microarchitecture," White Paper.
[24]
A. Ros, M. E. Acacio, and J. M. García "Scalable Directory Organization for Tiled CMP Architectures," ICCAD, July 2008.
[25]
T. Sherwood, B. Calder, and J. Emer. "Reducing CacheMisses Using Hardware and Software Page Placement," ICS, June 1999.
[26]
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. "POWER5 System Microarchitecture," IBM J. Res. & Dev., July. 2005.
[27]
S. Srikantaiah, M. Kandemir, and M. J. Irwin. "Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors," ASPLOS, March 2008.
[28]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA, Feb. 2007.
[29]
Standard Performance Evaluation Corporation. http://www.specbench.org.
[30]
D. Tam, R. Azimi, L. Soares, and M. Stumm. "Managing Shared L2 Caches on Multicore Systems in Software," WIOSCA, 2007.
[31]
N. Topham, A. Gonzalez, and J. Gonzalez. "The Design and Performance of a Conflict-Avoiding Cache," MICRO, 1997.
[32]
H. Vandierendonck, P. Manet, and J.-D. Legat. "Application-Specific Reconfigurable XOR-Indexing To Eliminate Cache Conflict Misses," DATE, 2006.
[33]
Virtutech AB. Simics Full System Simulator "http://www.simics.com/"
[34]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations," ISCA, July 1995.
[35]
C. Zhang. "Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches," ISCA, June 2006.
[36]
M. Zhang and K. Asanović. "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors," ISCA, June 2005.

Cited By

View all
  • (2016)A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA ArchitectureACM Transactions on Design Automation of Electronic Systems10.1145/290794622:1(1-26)Online publication date: 27-May-2016
  • (2015)Exploration of Migration and Replacement Policies for Dynamic NUCA over Tiled CMPs2015 28th International Conference on VLSI Design10.1109/VLSID.2015.29(141-146)Online publication date: Jan-2015
  • (2014)A Practical Data Classification Framework for Scalable and High Performance Chip-MultiprocessorsIEEE Transactions on Computers10.1109/TC.2013.16163:12(2905-2918)Online publication date: 1-Dec-2014
  • Show More Cited By

Index Terms

  1. Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
    January 2011
    226 pages
    ISBN:9781450302418
    DOI:10.1145/1944862
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • HiPEAC: HiPEAC Network of Excellence

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 January 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chip multiprocessors
    2. group-based placement
    3. pressure-aware placement
    4. private cache
    5. shared cache

    Qualifiers

    • Research-article

    Conference

    HIPEAC '11
    Sponsor:
    • HiPEAC

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA ArchitectureACM Transactions on Design Automation of Electronic Systems10.1145/290794622:1(1-26)Online publication date: 27-May-2016
    • (2015)Exploration of Migration and Replacement Policies for Dynamic NUCA over Tiled CMPs2015 28th International Conference on VLSI Design10.1109/VLSID.2015.29(141-146)Online publication date: Jan-2015
    • (2014)A Practical Data Classification Framework for Scalable and High Performance Chip-MultiprocessorsIEEE Transactions on Computers10.1109/TC.2013.16163:12(2905-2918)Online publication date: 1-Dec-2014
    • (2012)Practically privateProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370852(231-240)Online publication date: 19-Sep-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media