skip to main content
10.1145/2925426.2926270acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures

Published: 01 June 2016 Publication History

Abstract

As we enter the era of many-core, providing the shared memory abstraction through cache coherence has become progressively difficult. The de-facto standard directory-based cache coherence has been extensively studied; but it does not scale well with increasing core count. Timestamp-based hardware coherence protocols introduced recently offer an attractive alternative solution. In this paper, we propose a timestamp-based coherence protocol, called TC-Release++, that addresses the scalability issues of efficiently supporting cache coherence in large-scale systems.
Our approach is inspired by TC-Weak, a recently proposed timestamp-based coherence protocol targeting GPU architectures. We first design TC-Release coherence in an attempt to straightforwardly port TC-Weak to general-purpose many-cores. But re-purposing TC-Weak for general-purpose many-core architectures is challenging due to significant differences both in architecture and the programming model. Indeed the performance of TC-Release turns out to be worse than conventional directory coherence protocols. We overcome the limitations and overheads of TC-Release by introducing simple hardware support to eliminate frequent memory stalls, and an optimized life-time prediction mechanism to improve cache performance. The resulting optimized coherence protocol TC-Release++ is highly scalable (overhead for coherence per last-level cache line scales logarithmically with core count as opposed to linearly for directory coherence) and shows better execution time (3.0%) and comparable network traffic (within 1.3%) relative to the baseline MESI directory coherence protocol.

References

[1]
M. M. Martin, M. D. Hill, and D. J. Sorin, "Why On-Chip Cache Coherence is Here to Stay," Communications of the ACM, 2012.
[2]
D. J. Sorin, M. D. Hill, and D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Morgan and Claypool Publishers, 2011.
[3]
A. Gupta, W.-D. Weber, and T. Mowry, "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes.," in International Conference for Parallel Processing, 1990.
[4]
Z. Hongzhou, A. Shriraman, and S. Dwarkadas, "SPACE: Sharing Pattern-Based Directory Coherence for Multicore Scalability," in International Conference on Parallel Architectures and Compilation Techniques, 2010.
[5]
M. Alisafaee, "Spatiotemporal Coherence Tracking," in International Symposium on Microarchitecture, 2012.
[6]
J. Zebchuk, B. Falsafi, and A. Moshovos, "Multi-Grain Coherence Directories," in International Symposium on Microarchitecture, 2013.
[7]
Y. Yao, G. Wang, Z. Ge, T. Mitra, W. Chen, and N. Zhang, "SelectDirectory: A Selective Directory for Cache Coherence in Many-Core Architectures," in Design, Automation and Test in Europe, 2015.
[8]
L. Zhang, D. Strukov, H. Saadeldeen, D. Fan, M. Zhang, and D. Franklin, "SpongeDirectory: Flexible Sparse Directories Utilizing Multi-Level Memristors," in International Conference on Parallel Architectures and Compilation Techniques, 2014.
[9]
D. Sanchez and C. Kozyrakis, "SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding," in International Symposium on High-Performance Computer Architecture, 2012.
[10]
B. A. Cuesta, A. Ros, M. E. Gómez, A. Robles, and J. F. Duato, "Increasing the Effectiveness of Directory Caches by Deactivating Coherence for Private Memory Blocks," in International Symposium on Computer Architecture, 2011.
[11]
M. Ferdman, P. Lotfi-Kamran, K. Balet, and B. Falsafi, "Cuckoo Directory: A Scalable Directory for Many-Core Systems," in International Symposium on High-Performance Computer Architecture, 2011.
[12]
M. Lis, K. S. Shim, M. H. Cho, and S. Devadas, "Memory Coherence in the Age of Multicores," in International Conference on Computer Design, 2011.
[13]
K. S. Shim, M. H. Cho, M. Lis, and S. Devadas, "Library Cache Coherence," in Csail technical report, 2011.
[14]
I. Singh, A. Shriraman, W. W. Fung, M. O'Connor, and T. M. Aamodt, "Cache Coherence for GPU Architectures," in International Symposium on High-Performance Computer Architecture, 2013.
[15]
X. Yu and S. Devadas, "Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory," in International Conference on Parallel Architectures and Compilation Techniques, 2015.
[16]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors," International Symposium on Computer Architecture, 1990.
[17]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," in International Symposium on Workload Characterization, 2009.
[18]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," International Symposium on Computer Architecture, 2011.
[19]
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for A Single-Chip Multiprocessor," International Conference on Architectural Support for Programming Languages and Operating Systems, 1996.
[20]
A. Ros and S. Kaxiras, "Complexity-Effective Multicore Coherence," International Conference on Parallel Architectures and Compilation Techniques, 2012.
[21]
T. J. Ashby, P. Diaz, and M. Cintra, "Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters," IEEE Transactions on Computers, 2011.
[22]
H. Sung, R. Komuravelli, and S. V. Adve, "DeNovoND: Efficient Hardware Support for Disciplined Non-Determinism," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2013.
[23]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in International Conference on Parallel Architectures and Compilation Techniques, 2008.
[24]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," in International Symposium on Computer Architecture, 1995.
[25]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, and S. Sardashti, "The gem5 Simulator," Computer Architecture News, 2011.
[26]
N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, "GARNET: A Detailed On-Chip Network Model inside A Full-System Simulator," in International Symposium on Performance Analysis of Systems and Software, 2009.
[27]
D. Wendel, R. Kalla, R. Cargoni, J. Clables, J. Friedrich, R. Frech, J. Kahle, B. Sinharoy, W. Starke, S. Taylor, S. Weitzel, S. G. Chu, S. Islam, and V. Zyuban, "The Implementation of POWER7 TM: A Highly Parallel and Scalable Multi-Core High-End server Processor," in International Solid-State Circuits Conference, 2010.
[28]
A. Basu, D. R. Hower, M. D. Hill, and M. M. Swift, "Freshcache: Statically and Dynamically Exploiting Dataless Ways," in International Conference on Computer Design, 2013.
[29]
C. Wilkerson, A. R. Alameldeen, Z. Chishti, W. Wu, D. Somasekhar, and S.-l. Lu, "Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes," International Symposium on Computer Architecture, 2010.
[30]
S. L. Min and J.-L. Baer, "Design and Analysis of A Scalable Cache Coherence Scheme Based on Clocks and Timestamps," IEEE Transactions on Parallel and Distributed Systems, 1992.
[31]
X. Yuan, R. Melhem, and R. Gupta, "A Timestamp-Based Selective Invalidation Scheme for Multiprocessor Cache Coherence," in International Conference for Parallel Processing, 1996.
[32]
S. Nandy and R. Narayan, "An Incessantly Coherent Cache Scheme for Shared Memory Multithreaded Systems," in International Workshop on Parallel Processing, 1994.
[33]
M. Elver and V. Nagarajan, "TSO-CC: Consistency Directed Cache Coherence for TSO," International Symposium on High-Performance Computer Architecture, 2014.
[34]
M. Elver and V. Nagarajan, "RC3: Consistency Directed Cache Coherence for x86-64 with RC Extensions," International Conference on Parallel Architectures and Compilation Techniques, 2015.
[35]
A. R. Lebeck and D. A. Wood, "Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors," in International Symposium on Computer Architecture, 1995.
[36]
S. Kaxiras and G. Keramidas, "SARC Coherence: Scaling Directory Cache Coherence in Performance and Power," IEEE Micro, 2010.
[37]
A. Ros and S. Kaxiras, "Callback: Efficient Synchronization without Invalidation with A Directory Just for Spin-Waiting," International Symposium on Computer Architecture, 2015.
[38]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou, "DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism," in International Conference on Parallel Architectures and Compilation Techniques, 2011.
[39]
H. Sung and S. V. Adve, "DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2015.

Cited By

View all
  • (2021)Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00032(277-290)Online publication date: Feb-2021
  • (2019)Enabling Predictable, Simultaneous and Coherent Data Sharing in Mixed Criticality Systems2019 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS46320.2019.00045(433-445)Online publication date: Dec-2019
  • (2016)A Time-Space Attribute-Based Evidence Fixing Method in Digital Forensics2016 Third International Conference on Trustworthy Systems and their Applications (TSA)10.1109/TSA.2016.30(127-131)Online publication date: Sep-2016
  1. Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '16: Proceedings of the 2016 International Conference on Supercomputing
    June 2016
    547 pages
    ISBN:9781450343619
    DOI:10.1145/2925426
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICS '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00032(277-290)Online publication date: Feb-2021
    • (2019)Enabling Predictable, Simultaneous and Coherent Data Sharing in Mixed Criticality Systems2019 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS46320.2019.00045(433-445)Online publication date: Dec-2019
    • (2016)A Time-Space Attribute-Based Evidence Fixing Method in Digital Forensics2016 Third International Conference on Trustworthy Systems and their Applications (TSA)10.1109/TSA.2016.30(127-131)Online publication date: Sep-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media