Skip to main content
Log in

Scalable short-entry dual-grain coherence directories with flexible region granularity

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As the number of cores in a chip multiprocessor increase, the directory size becomes excessive. Current research shows that directory size can be reduced by tracking private entries with coarse-grain directory entries called region entries. In order to indicate which blocks the region owner has cached, the present vector in the region entries in the dual-grain directory (DGD) uses a bit vector format. If a coarser-grain region granularity is used, the length of the region entries becomes excessive. Besides, most of the latest scalable directories use the short-entry directory format with only one pointer. Therefore, DGD has limited flexibility of region granularity and is incompatible with the latest scalable directories. In this paper, we propose a scalable short-entry dual-grain coherence directory with flexible region granularity (SS-DGD). In private region entries, a counter is used instead of the original bit vector. Region entries using counters and private block entries using a single pointer always have the same length, giving SS-DGD the flexibility of region size. To reduce the directory size, SS-DGD is divided into a shared directory and a private directory that includes private block entries and private region entries. With the same total number of directory entries, SS-DGD has a smaller directory size than previous DGD because the private directory entries in SS-DGD are shorter. And the detailed simulation-based study shows that there are no statistically significant differences in execution time and network traffic between SS-DGD and DGD. In the 64-core system, our proposal can reduce the directory size by 29.9%. More importantly, the region entries in SS-DGD can be used in the latest scalable directories and have a high potential to compress the number of directory entries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

The datasets used or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp 93–103

  2. Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: A scalable two-level directory cache. Parallel Architectures and Compilation Techniques–Conference Proceedings, PACT 451–452. https://doi.org/10.1145/2370816.2370891

  3. Valls JJ, Ros A, Sahuquillo J, Gómez ME (2015) PS directory: a scalable multilevel directory cache for CMPs. J Supercomput 71(8):2847–2876. https://doi.org/10.1007/s11227-014-1332-5

    Article  Google Scholar 

  4. Shukla S, Chaudhuri M (2015) Pool directory: efficient coherence tracking with dynamic directory allocation in many-core systems. In: 2015 33rd IEEE International Conference on Computer Design (ICCD), pp 557–564 . https://doi.org/10.1109/ICCD.2015.7357165

  5. Titos-Gil R, Flores A, Fernández-Pascual R, Ros A, Acacio ME (2017) Way-combining directory: an adaptive and scalable low-cost coherence directory. In: Proceedings of the International Conference on Supercomputing. ICS ’17. Association for Computing Machinery, NY, USA. https://doi.org/10.1145/3079079.3079096. https://doi.org/10.1145/3079079.3079096

  6. Bae HJ, Choi L (2018) Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks. J Supercomput 75:425–446

    Article  Google Scholar 

  7. Alisafaee M (2012) Spatiotemporal coherence tracking. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 341–350. https://doi.org/10.1109/MICRO.2012.39

  8. Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370

  9. Tang Y, Qiu Y, Liu Y, Jiao J, Zhang P, Fan Y (2022) Ss-dgd: Scalable short-entry dual-grain coherence directoris. In: 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp 1–3. https://doi.org/10.1109/ICSICT55466.2022.9963419

  10. Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), pp 282–293

  11. Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: [1988] The 15th Annual International Symposium on Computer Architecture. Conference proceedings, pp 280–289. https://doi.org/10.1109/ISCA.1988.5238

  12. Gupta A, Weber W-D, Mowry TC (1990) Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: ICPP

  13. Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 42, pp 423–434. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1669112.1669166. https://doi.org/10.1145/1669112.1669166

  14. Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture. HPCA ’11, pp 169–180. IEEE Computer Society, USA

  15. Sanchez D, Kozyrakis C (2012) Scd: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture. HPCA ’12, pp 1–12. IEEE Computer Society, USA . https://doi.org/10.1109/HPCA.2012.6168950. https://doi.org/10.1109/HPCA.2012.6168950

  16. Sanchez D, Kozyrakis C (2010) The zcache: decoupling ways and associativity. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’43, pp 187–198. IEEE Computer Society, USA. https://doi.org/10.1109/MICRO.2010.20. https://doi.org/10.1109/MICRO.2010.20

  17. Zhao H, Shriraman A, Dwarkadas S (2010) Space: sharing pattern-based directory coherence for multicore scalability. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 135–146

  18. Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) Spatl: honey, i shrunk the coherence directory. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp 33–44. https://doi.org/10.1109/PACT.2011.10

  19. Valls JJ, Gómez ME, Ros A, Sahuquillo J (2017) A directory cache with dynamic private-shared partitioning. In: Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016, 382–391. https://doi.org/10.1109/HiPC.2016.051

  20. Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2013) Increasing the effectiveness of directory caches by avoiding the tracking of noncoherent memory blocks. IEEE Trans Comput 62(3):482–495. https://doi.org/10.1109/TC.2011.241

    Article  MathSciNet  Google Scholar 

  21. Fang L, Liu P, Hu Q, Huang MC, Jiang G (2013) Building expressive, area-efficient coherence directories. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp 299–308. https://doi.org/10.1109/PACT.2013.6618826

  22. Martin MMK, Hill MD, Wood DA (2003) Token coherence: Decoupling performance and correctness. In: Conference Proceedings–Annual International sSmposium on Computer Architecture, ISCA, 182–193. https://doi.org/10.1109/isca.2003.1206999

  23. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput. Archit. News 39(2):1–7. https://doi.org/10.1145/2024716.2024718

    Article  Google Scholar 

  24. Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 72–81

  25. Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd Annual International Symposium on Computer Architecture, pp 24–36. https://doi.org/10.1109/ISCA.1995.524546

  26. Gebhart M, Hestness J, Fatehi E, Gratz P, Keckler SW (October 2009) Running parsec 2.1 on m5. Technical report TR-09-32, The University of Texas at Austin

  27. PARSEC Group: a memo on exploration of SPLASH-2 Input Sets (June), 1–12 (2011)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.

Author information

Authors and Affiliations

Authors

Contributions

Y and Y wrote the main manuscript text. All authors helped with the experiments and reviewed the manuscript.

Corresponding author

Correspondence to Yibo Fan.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors

Ethical Approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Y., Qiu, Y., Liu, Y. et al. Scalable short-entry dual-grain coherence directories with flexible region granularity. J Supercomput 80, 2889–2911 (2024). https://doi.org/10.1007/s11227-023-05559-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05559-8

Keywords

Navigation