Skip to main content

Optimization of a Linked Cache Coherence Protocol for Scalable Manycore Coherence

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Abstract

Despite having been quite popular during the 1990 s because of their important advantages, linked cache coherence protocols have gone completely unnoticed in the multicore wave. In this work we bring them in the spotlight, demonstrating that they are a good alternative to other solutions being proposed nowadays. In particular, we consider in this work the case for a simply-linked list-based cache coherence protocol and propose two techniques, namely Concurrent Replacements (CR) and Opportunistic Replacements (OR), aimed at palliating the negative effects of replacements of clean data. Through detailed simulations of several SPLASH-2 and PARSEC applications, we demonstrate that, armed with CR and OR, simply-linked list-based protocols are able to offer the performance of a non-scalable bit-vector directory at the same time that scalability to larger core counts is preserved.

This work has been supported by the Spanish MINECO, as well as European Commission FEDER funds, under grants “TIN2012-38341-C04-03” and “TIN2015-66972-C5-3-R”, and by the Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia under grant “19295/PI/14”.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Without generality loss, we assume private L1 caches in each core and an inclusive, shared L2 cache distributed between them.

  2. 2.

    In the case of clean shared replacements, the writeback buffer only needs to store the sharing information, not the data. Due to its very small size in List, this information may alternatively be kept in a miss status holding register (MSHR) or similar structure.

References

  1. Agarwal, N., Krishna, T., Peh, L.S., Jha, N.K.: GARNET: a detailed on-chip network model inside a full-system simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 33–42, April 2009

    Google Scholar 

  2. Alameldeen, A.R., Wood, D.A.: Variability in architectural simulations of multi-threaded workloads. In: 9th International Symposium on High-Performance Computer Architecture (HPCA), pp. 7–18, February 2003

    Google Scholar 

  3. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 72–81, October 2008

    Google Scholar 

  4. Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Blade computing with the AMD Opteron™ processor (“Magny Cours”). In: 21st HotChips Symposium, August 2009

    Google Scholar 

  5. Cuesta, B., Ros, A., Gómez, M.E., Robles, A., Duato, J.: Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th International Symposium on Computer Architecture (ISCA), pp. 93–103, June 2011

    Google Scholar 

  6. Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: a Hardware/Software Approach. Morgan Kaufmann Inc., Burlington (1999)

    Google Scholar 

  7. Demetriades, S., Cho, S.: Stash directory: a scalable directory for many-core coherence. In: 20th International Symposium on High-Performance Computer Architecture (HPCA), pp. 177–188, February 2014

    Google Scholar 

  8. Fang, L., Liu, P., Hu, Q., Huang, M.C., Jiang, G.: Building expressive, area-efficient coherence directories. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 299–308, September 2013

    Google Scholar 

  9. Fernández-Pascual, R., Ros, A., Acacio, M.E.: Characterization of a list-based directory cache coherence protocol for manycore CMPs. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014, Part II. LNCS, vol. 8806, pp. 254–265. Springer, Heidelberg (2014)

    Google Scholar 

  10. James, D., Laundrie, A., Gjessing, S., Sohi, G.: Scalable coherent interface. Computer 23(6), 74–77 (1990)

    Article  Google Scholar 

  11. Lovett, T., Clapp, R.: STiNG: a CC-NUMA computer system for the commercial marketplace. In: 23rd International Symposium on Computer Architecture (ISCA), pp. 308–317, June 1996

    Google Scholar 

  12. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 190–200, June 2005

    Google Scholar 

  13. Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput. Archit. News 33(4), 92–99 (2005)

    Article  Google Scholar 

  14. Monchiero, M., Ahn, J.H., Falcón, A., Ortega, D., Faraboschi, P.: How to simulate 1000 cores. Comput. Archit. News 37(2), 10–19 (2009)

    Article  Google Scholar 

  15. Sanchez, D., Kozyrakis, C.: SCD: a scalable coherence directory with flexible sharer set encoding. In: 18th International Symposium on High-Performance Computer Architecture (HPCA), pp. 129–140, February 2012

    Google Scholar 

  16. Thapar, M., Delagi, B.: Stanford distributed-directory protocol. Computer 23(6), 78–80 (1990)

    Article  Google Scholar 

  17. Thekkath, R., Singh, A.P., Singh, J.P., John, S., Hennessy, J.L.: An evaluation of a commercial CC-NUMA architecture: The CONVEX Exemplar SPP1200. In: 11th International Symposium on Parallel Processing (IPPS), pp. 8–17, April 1997

    Google Scholar 

  18. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: 22nd International Symposium on Computer Architecture (ISCA), pp. 24–36, June 1995

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Fernández-Pascual .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fernández-Pascual, R., Ros, A., Acacio, M.E. (2016). Optimization of a Linked Cache Coherence Protocol for Scalable Manycore Coherence. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30695-7_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30694-0

  • Online ISBN: 978-3-319-30695-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics