Abstract
As the number of cores in a chip multiprocessor increase, the directory size becomes excessive. Current research shows that directory size can be reduced by tracking private entries with coarse-grain directory entries called region entries. In order to indicate which blocks the region owner has cached, the present vector in the region entries in the dual-grain directory (DGD) uses a bit vector format. If a coarser-grain region granularity is used, the length of the region entries becomes excessive. Besides, most of the latest scalable directories use the short-entry directory format with only one pointer. Therefore, DGD has limited flexibility of region granularity and is incompatible with the latest scalable directories. In this paper, we propose a scalable short-entry dual-grain coherence directory with flexible region granularity (SS-DGD). In private region entries, a counter is used instead of the original bit vector. Region entries using counters and private block entries using a single pointer always have the same length, giving SS-DGD the flexibility of region size. To reduce the directory size, SS-DGD is divided into a shared directory and a private directory that includes private block entries and private region entries. With the same total number of directory entries, SS-DGD has a smaller directory size than previous DGD because the private directory entries in SS-DGD are shorter. And the detailed simulation-based study shows that there are no statistically significant differences in execution time and network traffic between SS-DGD and DGD. In the 64-core system, our proposal can reduce the directory size by 29.9%. More importantly, the region entries in SS-DGD can be used in the latest scalable directories and have a high potential to compress the number of directory entries.
Similar content being viewed by others
Data availability
The datasets used or analysed during the current study are available from the corresponding author on reasonable request.
References
Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp 93–103
Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: A scalable two-level directory cache. Parallel Architectures and Compilation Techniques–Conference Proceedings, PACT 451–452. https://doi.org/10.1145/2370816.2370891
Valls JJ, Ros A, Sahuquillo J, Gómez ME (2015) PS directory: a scalable multilevel directory cache for CMPs. J Supercomput 71(8):2847–2876. https://doi.org/10.1007/s11227-014-1332-5
Shukla S, Chaudhuri M (2015) Pool directory: efficient coherence tracking with dynamic directory allocation in many-core systems. In: 2015 33rd IEEE International Conference on Computer Design (ICCD), pp 557–564 . https://doi.org/10.1109/ICCD.2015.7357165
Titos-Gil R, Flores A, Fernández-Pascual R, Ros A, Acacio ME (2017) Way-combining directory: an adaptive and scalable low-cost coherence directory. In: Proceedings of the International Conference on Supercomputing. ICS ’17. Association for Computing Machinery, NY, USA. https://doi.org/10.1145/3079079.3079096. https://doi.org/10.1145/3079079.3079096
Bae HJ, Choi L (2018) Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks. J Supercomput 75:425–446
Alisafaee M (2012) Spatiotemporal coherence tracking. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 341–350. https://doi.org/10.1109/MICRO.2012.39
Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370
Tang Y, Qiu Y, Liu Y, Jiao J, Zhang P, Fan Y (2022) Ss-dgd: Scalable short-entry dual-grain coherence directoris. In: 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp 1–3. https://doi.org/10.1109/ICSICT55466.2022.9963419
Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), pp 282–293
Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: [1988] The 15th Annual International Symposium on Computer Architecture. Conference proceedings, pp 280–289. https://doi.org/10.1109/ISCA.1988.5238
Gupta A, Weber W-D, Mowry TC (1990) Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: ICPP
Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 42, pp 423–434. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1669112.1669166. https://doi.org/10.1145/1669112.1669166
Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture. HPCA ’11, pp 169–180. IEEE Computer Society, USA
Sanchez D, Kozyrakis C (2012) Scd: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture. HPCA ’12, pp 1–12. IEEE Computer Society, USA . https://doi.org/10.1109/HPCA.2012.6168950. https://doi.org/10.1109/HPCA.2012.6168950
Sanchez D, Kozyrakis C (2010) The zcache: decoupling ways and associativity. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’43, pp 187–198. IEEE Computer Society, USA. https://doi.org/10.1109/MICRO.2010.20. https://doi.org/10.1109/MICRO.2010.20
Zhao H, Shriraman A, Dwarkadas S (2010) Space: sharing pattern-based directory coherence for multicore scalability. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 135–146
Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) Spatl: honey, i shrunk the coherence directory. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp 33–44. https://doi.org/10.1109/PACT.2011.10
Valls JJ, Gómez ME, Ros A, Sahuquillo J (2017) A directory cache with dynamic private-shared partitioning. In: Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016, 382–391. https://doi.org/10.1109/HiPC.2016.051
Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2013) Increasing the effectiveness of directory caches by avoiding the tracking of noncoherent memory blocks. IEEE Trans Comput 62(3):482–495. https://doi.org/10.1109/TC.2011.241
Fang L, Liu P, Hu Q, Huang MC, Jiang G (2013) Building expressive, area-efficient coherence directories. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp 299–308. https://doi.org/10.1109/PACT.2013.6618826
Martin MMK, Hill MD, Wood DA (2003) Token coherence: Decoupling performance and correctness. In: Conference Proceedings–Annual International sSmposium on Computer Architecture, ISCA, 182–193. https://doi.org/10.1109/isca.2003.1206999
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput. Archit. News 39(2):1–7. https://doi.org/10.1145/2024716.2024718
Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 72–81
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd Annual International Symposium on Computer Architecture, pp 24–36. https://doi.org/10.1109/ISCA.1995.524546
Gebhart M, Hestness J, Fatehi E, Gratz P, Keckler SW (October 2009) Running parsec 2.1 on m5. Technical report TR-09-32, The University of Texas at Austin
PARSEC Group: a memo on exploration of SPLASH-2 Input Sets (June), 1–12 (2011)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.
Author information
Authors and Affiliations
Contributions
Y and Y wrote the main manuscript text. All authors helped with the experiments and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors
Ethical Approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, Y., Qiu, Y., Liu, Y. et al. Scalable short-entry dual-grain coherence directories with flexible region granularity. J Supercomput 80, 2889–2911 (2024). https://doi.org/10.1007/s11227-023-05559-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05559-8