Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors

Kapoor, Hemangee K.; Kanakala, Praveen; Verma, Malti; Das, Shirshendu

doi:10.1007/s11227-012-0865-8

Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors

Published: 18 January 2013

Volume 65, pages 771–796, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hemangee K. Kapoor¹,
Praveen Kanakala¹,
Malti Verma¹ &
…
Shirshendu Das¹

450 Accesses
2 Citations
Explore all metrics

Abstract

Advancement in semiconductor technology is allowing to pack more and more processing cores on a single die and scalable directory based protocols are needed for maintaining cache coherence. Most of the currently available directory based protocols are designed for mesh based topology and have the problem of delay and scalability. Cluster based coherence protocol is a better option than flat directory based protocol but the problem of mesh based topology is still exits. On the other hand, tree based topology takes fewer hop counts compared to mesh based topology.

In this paper we give a hierarchical cache coherence protocol based on tree based topology. We divide the processing cores into clusters and each cluster shares a higher-level cache. At the next level we form clusters of caches connected to yet another higher-level cache. This is continued up to the top level cache/memory. We give various architectural placements that can benefit from the protocol; hop-count comparison; and memory overhead requirements. Finally, we formally verify the protocol using the Murϕ tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mosaic: A Scalable Coherence Protocol

Article 29 January 2018

DASC-DIR: a low-overhead coherence directory for many-core processors

Article 05 November 2014

PS directory: a scalable multilevel directory cache for CMPs

Article 12 November 2014

References

Wilson AW Jr (1987) Hierarchical cache/bus architecture for shared memory multiprocessors. In: Proceedings of the 14th international symposium on computer architecture, pp 244–252
Google Scholar
Acacio M, Gonzalez J, Garcia J, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst 16(1):67–79
Article Google Scholar
Acacio ME, Gonzalez J, Garcia JM, Duato J (2005) A two-level directory architecture for highly-scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst 16(1):67–79
Article Google Scholar
Anderson C, Baer JL (1993) A multi-level hierarchical cache coherence protocol for multiprocessors. In: Proceedings of the 7th international parallel processing symposium, pp 142–148
Chapter Google Scholar
Angiolini F, Meloni P, Carta S, Benini L, Raffo L (2006) Contrasting a NoC and a traditional interconnect fabric with layout awareness. In: Proc of the design, automation and test in Europe (DATE), pp 124–129
Google Scholar
Benini L, Micheli D (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35(1):70–78
Article Google Scholar
Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherence. In: Proc of 1st international symposium on networks-on-chip, pp 117–126
Google Scholar
Chaike D, Field C, Kurihara K, Agarwal A (1990) Directory-based cache coherence in large-scale multiprocessors. IEEE Comput 23:49–58
Article Google Scholar
Cheng L, Muralimanohar N, Ramani K, Balasubramonian R, Carter JB (2006) Interconnect-aware coherence protocols for chip multiprocessors. ACM SIGARCH Comput Archit News 34(2):339–351
Article Google Scholar
Dally W, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proc of design automation conference, pp 684–689
Google Scholar
DeHon A (2000) Compact, multilayer layout for butterfly fat-tree. In: Proceedings of the twelfth annual ACM symposium on parallel algorithms and architectures, SPAA’00, pp 206–215
Chapter Google Scholar
DeHon A (2004) Unifying mesh- and tree-based programmable interconnect. IEEE Trans Very Large Scale Integr Syst 12(10):1051–1065
Article Google Scholar
Dill DL, Drexler AJ, Hu AJ, Yang CH (1992) Protocol verification as a hardware design aid. In: Proc of international conference on computer design, pp 522–525
Google Scholar
Eisley N, Peh LS, Shang L (2006) In-network cache coherence. Comput Archit Lett 5:34–37
Google Scholar
Feero B, Pande P (2009) Networks-on-chip in a three-dimensional environment: a performance evaluation. IEEE Trans Comput 58:32–45
Article MathSciNet Google Scholar
Gratz P, Kim C, Sankaralingam K, Hanson H, Shivakumar P, Keckler S, Burger D (2007) On-chip interconnection networks of the trips chip. IEEE MICRO 27(5):41–50
Article Google Scholar
Hennessy JL, Patterson DA (2006) Computer architecture, fourth edition: a quantitative approach. Morgan Kaufmann, San Francisco
Google Scholar
Hoskote Y, Vangal S, Singh A, Borkar N, Borkar S (2007) A 5-GHz mesh interconnect for a teraflops processor. IEEE MICRO 27(5):51–61
Article Google Scholar
Kim C, Burger D, Keckler S (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proc of the 10th intl conf on architectural support for programming languages and operating systems, pp 211–222
Chapter Google Scholar
Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901
Article Google Scholar
Lenoski D, Laudon J, Gharachorloo K, Gupta A, Hennessy J (1990) The directory-based cache coherence protocol for the DASH multiprocessor. In: Proceedings of the 17th annual international symposium on computer architecture, pp 148–159
Chapter Google Scholar
Li F, Nicopoulos C, Richardson T, Xie Y, Narayanan V, Kandemir M (2006) Design and management of 3D chip multiprocessors using network-in-memory. In: Proc intl symposium on computer architecture, pp 131–140
Google Scholar
Ludovici D, Villamon FG, Medardoni S, Requena CG, Gomez ME, Lopez P, Gaydadjiev GN, Bertozzi D (2009) Assessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints. In: Proc of design, automation and test in Europe (DATE), pp 562–565
Google Scholar
Martin MMK, Hill MD, Wood DA (2003) Token coherence: a new framework for shared-memory multiprocessors. IEEE MICRO 23(6):108–116
Article Google Scholar
Martin MMK, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: Proc of international symposium on computer architecture (ISCA), pp 182–193
Google Scholar
Matsutani H, Koibuchi M, Amano H (2007) Performance, cost and energy evaluation of fat H-tree: a cost efficient tree-based on-chip network. In: Proc of parallel and distributed processing symposium (IPDPS), pp 1–10
Google Scholar
Matsutani H, Koibuchi M, Yamada Y, Hsu DF, Amano H (2009) Fat H-tree: a cost efficient tree-based on-chip network. IEEE Trans Parallel Distrib Syst 20(8):1126–1141
Article Google Scholar
Stern U, Dill DL (1995) Automatic verification of the SCI cache coherence protocol. In: Correct hardware design and verification methods. LNCS, vol 987, pp 21–34
Chapter Google Scholar
Tomasevic M, Milutinovic V (1993) A survey of hardware solutions for maintenance of cache coherence in shared memory multiprocessors. In: Proceeding of the twenty-sixth Hawaii international conference on system sciences, 1993, vol 1, pp 863–872.
Chapter Google Scholar
Tsui J, Aboelaze M (1996) Single copy vs. multiple copies cache coherence protocols for hierarchical bus multiprocessors. In: Proceedings of the international conference on computers and communications, pp 151–157
Google Scholar
Wallach DA (1992) A hierarchical cache coherent protocol. PhD thesis, MIT
Wentzlaff D, Griffin P, Hoffmann H, Bao L, Edwards B, Ramey C, Mattina M, Miao CC, Brown JF III, Agarwal A (2007) On-chip interconnection architecture of the tile processor. IEEE MICRO 27(5):15–31
Article Google Scholar
Yang Q, Thangadurai G, Bhuyan LM (1992) Design of an adaptive cache coherence protocol for large scale multiprocessors. IEEE Trans Parallel Distrib Syst 3(3):281–293
Article Google Scholar
Yousif MS, Das CR, Thazhuthaveetil MJ (1993) A cache coherence protocol for MIN-based multiprocessors with limited inclusion. In: International conference on parallel processing, pp 254–257
Google Scholar
Zhang Y, Lu Z, Jantsch A, Li L, Gao M (2009) Towards hierarchical cluster based cache coherence for large-scale network-on-chip. In: 4th intl conference on design and technology of integrated systems in nanoscale era (DTIS), pp 119–122
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam, 781 039, India
Hemangee K. Kapoor, Praveen Kanakala, Malti Verma & Shirshendu Das

Authors

Hemangee K. Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Praveen Kanakala
View author publications
You can also search for this author in PubMed Google Scholar
Malti Verma
View author publications
You can also search for this author in PubMed Google Scholar
Shirshendu Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemangee K. Kapoor.

Appendix: Example illustrating the protocol

Figures 5 to 9 show successive steps and state of the sample topology with various nodes performing read/write on a given block. The L2-1 to L2-4 are assumed to be different L2-caches (and not cache-banks). Two such L2-caches form one cluster connecting to L3-1 and so on. State of the block is shown near each cache. The light colour (filled) nodes are the ones holding/requesting read only copies and the dark colour (filled) nodes are for the writers.

In the beginning, node-1 performs a read. It will acquire a copy of the block (from the memory) and all caches along the hierarchy will have state 001 (Fig. 6).

Now suppose node-2 wants to write. It will send a write request to L2-1. As L2-1 has the block (state = 001) it will invalidate the other copies in the cluster [2: inv sent to node-1]. As we assume write-through for intra-cluster, L2-1 will have a dirty copy and this needs to be informed to the parent L3-1 [2: blk-dirty message]. Similarly, L3-1 will inform its parent Memory about blk-dirty. This message will make the state of L3-1 and Memory to 1X1. The state indicates that some child node holds a modified copy of the block. After L2-1 gets all the ack messages it will send the block to node-2 for writing, Fig. 7.

Using this as the start state we illustrate the procedure for read request from node-5 in Fig. 8 and a subsequent write request from node-9 in Fig. 9.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kapoor, H.K., Kanakala, P., Verma, M. et al. Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors. J Supercomput 65, 771–796 (2013). https://doi.org/10.1007/s11227-012-0865-8

Download citation

Published: 18 January 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s11227-012-0865-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors

Abstract

Access this article