skip to main content
10.1145/2541940.2541982acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

High-performance fractal coherence

Published: 24 February 2014 Publication History

Abstract

Bugs in cache coherence protocols can cause system failures. Despite many advances, verification runs into state explosion for even moderately-sized systems. As multicores' core counts increase, coherence verifiability continues to be a key problem. A recent proposal, called fractal coherence, avoids the state explosion problem by applying the idea of observational equivalence between a larger system and its smaller sub-systems. A fractal protocol for a larger system is verified by design if a minimal sub-system is verified completely. While fractal coherence is a significant step forward, there are two shortcomings: (1) Architectural limitation: To achieve fractal coherence's logical hierarchy, TreeFractal, the specific fractal protocol, employs a tree architecture where each miss traverses many levels up and down the tree and each level redundantly holds its sub-trees' coherence tags. (2) Protocol restrictions: TreeFractal imposes a restriction on responses to read requests that forces read requests to obtain clean blocks from the nearest sharer even if the shared L2 or L3 is faster. These limitations impose significant performance and coherence tag state overheads. In this paper, we propose architectural support for coherence protocols to achieve scalable performance and verifiability. To address the architectural limitation, we propose FlatFractal, a directory-based architecture which decouples fractal coherence's logical hierarchy from the architecture and eliminates redundant tag state. To address the protocol restriction, we propose a simple change to the protocol that, while preserving observational equivalence, allows read requests to obtain the blocks from the shared L2 or L3. Our simulations show that for 16 cores, FlatFractal performs, on average, 57% better than TreeFractal and within 3% of a conventional directory.

References

[1]
Open source development labs database test suite 2 v0.40 http://osdldbt.sourceforge.net/.
[2]
Postgresql. v9.2.0. http://www.postgresql.org/.
[3]
The standard performance evaluation corporation. SPECjbb2005 suite. http://www.spec.org/jbb2005/.
[4]
A. R. Alameldeen and D. A. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, 2003.
[5]
AMD. Revision Guide for AMD Family 14H Models 00h-0Fh Processors, revision 3.,18. February 2013.
[6]
P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the 1998 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pages 151--160, 1998. ACM.
[7]
B. Bentley. Simulation Driven Verification. Presentation at Design Automation Summer School. 2009.
[8]
D. Culler, J. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1st edition, 1998. The Morgan Kaufmann Series in Computer Architecture and Design.
[9]
D. Dill, A. Drexler, A. Hu, and C. Yang. Protocol verification as a hardware design aid. In Proceedings of the IEEE 1992 International Conference on Computer Design: VLSI in Computers and Processors, pages 522--525, 1992.
[10]
A. Felty and F. Stomp. A correctness proof of a cache coherence protocol. In Proceedings of the 11th Annual Conference on Computer Assurance, pages 128--141, Oct 1992.
[11]
H. Garavel, F. Lang, R. Mateescu, and W. Serwe. CADP 2010: A toolbox for the construction and analysis of distributed processes. In P. Abdulla and K. Leino, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 6605 of Lecture Notes in Computer Science, pages 372--387. Springer Berlin / Heidelberg, 2011. 10.1007/978-3-642-19835-9_33.
[12]
S. Gjessing, S. Krogdahl, and E. Munthe-Kaas. A linked list cache coherence protocol: verifying the bottom layer. In Proceedings of the 5th International Parallel Processing Symposium, pages 324--329, 1991.
[13]
S. Gjessing, S. Krogdahl, and E. Munthe-Kaas. A top down approach to the formal specification of SCI cache coherence. In Computer Aided Verification, 3rd International Workshop, pages 83--91. Springer, 1991.
[14]
Intel. Intel Xeon Processor E7--800/4800/2800 Product Families Specification Update. August 2013.
[15]
C. N. Ip and D. L. Dill. Better verification through symmetry. Formal Methods in System Design, 9(1-2):41--75, Aug. 1996.
[16]
J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 241--251, 1997. ACM.
[17]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, 1990. ACM.
[18]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, Feb. 2002.
[19]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Computer Architecture News, 33(4):92--99, Nov. 2005.
[20]
M. R. Marty and M. D. Hill. Virtual hierarchies to support server consolidation. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 46--56, New York, NY, USA, 2007. ACM.
[21]
R. Milner. A Calculus of Communicating Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1982.
[22]
S. Park and D. L. Dill. Verification of FLASH cache coherence protocol by aggregation of distributed transactions. In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 288--296, 1996. ACM.
[23]
F. Pong and M. Dubois. The verification of cache coherence protocols. In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 11--20, 1993. ACM.
[24]
F. Pong and M. Dubois. Verification techniques for cache coherence protocols. ACM Computing Surveys, 29(1):82--126, Mar. 1997.
[25]
F. Pong and M. Dubois. Formal verification of complex coherence protocols using symbolic state models. Journal of the ACM, 45(4):557--587, July 1998.
[26]
F. Pong, A. Nowatzyk, G. Aybay, and M. Dubois. Verifying distributed directory-based cache coherence protocols: S3.mp, a case study. In Euro-Par '95: Parallel Processing, volume 966 of Lecture Notes in Computer Science, pages 287--300, Springer Berlin/Heidelberg, 1995.
[27]
U. Stern and D. Dill. Automatic verification of the SCI cache coherence protocol. In Correct Hardware Design and Verification Methods: IFIP WG10.5 Advanced Research Working Conference Proceedings, 21--34, 1995.
[28]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1. Technical report, HP Technical Report, 2008.
[29]
C. tsun Chou, P. K. Mannava, and S. Park. A simple method for parameterized verification of cache coherence protocols. In Formal Methods in Computer Aided Design, pages 382--398. Springer, 2004.
[30]
Apache HTTP Server v2.2.11. The apache software foundation. http://httpd.apache.org/.
[31]
K. Williams and R. Esser. Verification of the Futurebus+ cache coherence protocol: a case study in model checking. In Proceedings of the 27th Australasian Conference on Computer Science -- Volume 26, pages 65--71, Darlinghurst, Australia, Australia, 2004. Australian Computer Society, Inc.
[32]
A. W. Wilson, Jr. Hierarchical cache/bus architecture for shared memory multiprocessors. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 244--252, 1987. ACM.
[33]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24--36, 1995. ACM.
[34]
P. Yeung and K. Larsen. Practical assertion-based formal verification for SoC designs. In Proceedings of the 2005 International Symposium on System-on-Chip, pages 58--61, 2005.
[35]
M. Zhang, A. R. Lebeck, and D. J. Sorin. Fractal coherence: Scalably verifiable cache coherence. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 471--482, Washington, DC, USA, 2010. IEEE Computer Society.

Cited By

View all
  • (2019)Directed Test Generation for Validation of Cache Coherence ProtocolsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.280123938:1(163-176)Online publication date: Jan-2019
  • (2019)Autonomous Data-Race-Free GPU Testing2019 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC47752.2019.9042019(81-92)Online publication date: Nov-2019
  • (2017)Architecting hierarchical coherence protocols for push-button parametric verificationProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123971(477-489)Online publication date: 14-Oct-2017
  • Show More Cited By

Index Terms

  1. High-performance fractal coherence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
    February 2014
    780 pages
    ISBN:9781450323055
    DOI:10.1145/2541940
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache coherence
    2. multicore
    3. verification

    Qualifiers

    • Research-article

    Conference

    ASPLOS '14

    Acceptance Rates

    ASPLOS '14 Paper Acceptance Rate 49 of 217 submissions, 23%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Directed Test Generation for Validation of Cache Coherence ProtocolsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.280123938:1(163-176)Online publication date: Jan-2019
    • (2019)Autonomous Data-Race-Free GPU Testing2019 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC47752.2019.9042019(81-92)Online publication date: Nov-2019
    • (2017)Architecting hierarchical coherence protocols for push-button parametric verificationProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123971(477-489)Online publication date: 14-Oct-2017
    • (2016)Verifiable hierarchical protocols with network invariants on parametric systemsProceedings of the 16th Conference on Formal Methods in Computer-Aided Design10.5555/3077629.3077650(101-108)Online publication date: 3-Oct-2016
    • (2016)Verifiable hierarchical protocols with network invariants on parametric systems2016 Formal Methods in Computer-Aided Design (FMCAD)10.1109/FMCAD.2016.7886667(101-108)Online publication date: Oct-2016
    • (2014)Fractal++Proceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665733(409-420)Online publication date: 14-Jun-2014
    • (2014)Fractal++ACM SIGARCH Computer Architecture News10.1145/2678373.266573342:3(409-420)Online publication date: 14-Jun-2014
    • (2014)Revisiting the Complexity of Hardware Cache Coherence and Some ImplicationsACM Transactions on Architecture and Code Optimization10.1145/266334511:4(1-22)Online publication date: 8-Dec-2014
    • (2014)Fractal++: Closing the performance gap between fractal and conventional coherence2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)10.1109/ISCA.2014.6853211(409-420)Online publication date: Jun-2014
    • (2014)PVCoherence: Designing flat coherence protocols for scalable verification2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2014.6835949(392-403)Online publication date: Feb-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media