Skip to main content

Advertisement

Log in

DP&TB: a coherence filtering protocol for many-core chip multiprocessors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Future many-core chip multiprocessors (CMPs) will integrate hundreds of processor cores on chip. Two cache coherence protocols are the mainstream applied to current CMPs. The token-based protocol (Token) provides high performance, but it generates a prohibitive amount of network traffic, which translates into excessive power consumption. The directory-based protocol (Directory) reduces network traffic, yet trades off with the storage overhead of the directory as well as entails comparatively low performance caused by indirection limiting its applicability for many-core CMPs.

In this work, we present DP&TB, a novel cache coherence protocol particularly suited to future many-core CMPs. In DP&TB, cache coherence is maintained at the granularity of a page, facilitating to filter out either unnecessary coherence inspections for blocks inside private pages or network traffic for blocks inside shared pages. We employ Directory to detect private and shared pages and Token to maintain the coherence of the blocks inside shared pages. DP&TB inherits the merit of Directory and Token and overcome their problems. Experimental results show that DP&TB comprehensively beyond Directory and Token with improvement by 9.1 % in performance over Token and by 13.8 % in network traffic over Directory. In addition, the storage overhead of DP&TB is less than half of that of Directory. Our proposal can fulfill the requirement of many-core CMPs to achieve high performance, power and area efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: IEEE intl symp on performance analysis of systems and software (ISPASS), pp 33–42

    Google Scholar 

  2. Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: 27th intl symp on computer architecture (ISCA), pp 12–14

    Google Scholar 

  3. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: 17th intl conference on parallel architectures and compilation techniques (PACT), pp 72–81

    Chapter  Google Scholar 

  4. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7

    Article  Google Scholar 

  5. Cantin JF, Lipasti MH, Smith JE (2005) Improving multiprocessor performance with coarse-grain coherence tracking. In: 32th intl symp on computer architecture (ISCA), pp 246–257

    Chapter  Google Scholar 

  6. Cuesta B, Ros A, Gmez EM, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th intl symp on computer architecture (ISCA), pp 93–104

    Google Scholar 

  7. Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th intl symp on computer architecture (ISCA), pp 184–195

    Google Scholar 

  8. Kalray (2012) First MPPA MANYCORE chip (MPPA256) integrates 256 cores. http://www.kalray.eu/products/mppa-manycore. Accessed 22 May 2012

  9. Kim D, Ahn J, Kim J, Huh J (2010) Subspace snooping: filtering snoops with operating system support. In: 19th intl conference on parallel architectures and compilation techniques (PACT), pp 111–122

    Chapter  Google Scholar 

  10. Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect power dissipation in a microprocessor. In: Intl workshop on system level interconnect prediction (SLIP), pp 7–13

    Google Scholar 

  11. Martin MMK (2003) Token coherence. PhD dissertation, University of Wisconsin

  12. Marty MR, Bingham J, Hill MD, Hu A, Martin MM, Wood DA (2005) Improving multiple CMP systems using token coherence. In: 11th intl symp on high-performance computer architecture (HPCA), pp 328–339

    Chapter  Google Scholar 

  13. Moshovos A (2005) RegionScout: exploiting coarse grain sharing in snoop-based coherence. In: 32nd intl symp on computer architecture (ISCA), pp 234–245

    Chapter  Google Scholar 

  14. Raghavan A, Blundell C, Martin MMK (2008) Token tenure: PATCHing token counting using directory-based cache coherence. In: 41st IEEE/ACM intl symp on microarchitecture (MICRO), pp 47–58

    Google Scholar 

  15. Ros A, Acacio ME, Garca JM (2010) A direct coherence protocol for many-core chip multiprocessors. IEEE Trans Parallel Distrib Syst 21(12):1779–1792

    Article  Google Scholar 

  16. Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Lee JW, Johnson P, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The raw microprocessor: a computational fabric for software circuits and general purpose programs. IEEE MICRO 22(2):25–35

    Article  Google Scholar 

  17. Tilera (2012) Tilera announces latest tile-gx family processors with up to 100 cores. http://www.tilera.com/products/processors/TILEGx_Family. Accessed 20 May 2012

  18. Wang J, Wang D, Wang H, Xue Y (2012) Dynamic reusability-based replication with network address mapping in CMPs. In: 17th Asia and South Pacific design automation conference (ASP-DAC), pp 487–492

    Chapter  Google Scholar 

  19. Zebchuk J, Safi E, Moshovos A (2007) A framework for coarse-grain optimizations in the on-chip memory hierarchy. In: 40th IEEE/ACM intl symp on microarchitecture (MICRO), pp 314–327

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengkai Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, F., Ji, Z. DP&TB: a coherence filtering protocol for many-core chip multiprocessors. J Supercomput 66, 249–261 (2013). https://doi.org/10.1007/s11227-013-0900-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0900-4

Keywords

Navigation