Scalable short-entry dual-grain coherence directories with flexible region granularity

Tang, Yuxin; Qiu, Yudi; Liu, Yanwei; Jiao, Jie; Zhang, Peng; Fan, Yibo

doi:10.1007/s11227-023-05559-8

Scalable short-entry dual-grain coherence directories with flexible region granularity

Published: 23 August 2023

Volume 80, pages 2889–2911, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yuxin Tang¹,
Yudi Qiu¹,
Yanwei Liu¹,
Jie Jiao¹,
Peng Zhang² &
…
Yibo Fan¹

107 Accesses
Explore all metrics

Abstract

As the number of cores in a chip multiprocessor increase, the directory size becomes excessive. Current research shows that directory size can be reduced by tracking private entries with coarse-grain directory entries called region entries. In order to indicate which blocks the region owner has cached, the present vector in the region entries in the dual-grain directory (DGD) uses a bit vector format. If a coarser-grain region granularity is used, the length of the region entries becomes excessive. Besides, most of the latest scalable directories use the short-entry directory format with only one pointer. Therefore, DGD has limited flexibility of region granularity and is incompatible with the latest scalable directories. In this paper, we propose a scalable short-entry dual-grain coherence directory with flexible region granularity (SS-DGD). In private region entries, a counter is used instead of the original bit vector. Region entries using counters and private block entries using a single pointer always have the same length, giving SS-DGD the flexibility of region size. To reduce the directory size, SS-DGD is divided into a shared directory and a private directory that includes private block entries and private region entries. With the same total number of directory entries, SS-DGD has a smaller directory size than previous DGD because the private directory entries in SS-DGD are shorter. And the detailed simulation-based study shows that there are no statistically significant differences in execution time and network traffic between SS-DGD and DGD. In the 64-core system, our proposal can reduce the directory size by 29.9%. More importantly, the region entries in SS-DGD can be used in the latest scalable directories and have a high potential to compress the number of directory entries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 15

Fig. 16

DASC-DIR: a low-overhead coherence directory for many-core processors

Article 05 November 2014

Improving multiprocessor performance with fine-grain coherence bypass

Article 11 September 2014

Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks

Article 05 January 2019

Data availability

The datasets used or analysed during the current study are available from the corresponding author on reasonable request.

References

Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp 93–103
Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: A scalable two-level directory cache. Parallel Architectures and Compilation Techniques–Conference Proceedings, PACT 451–452. https://doi.org/10.1145/2370816.2370891
Valls JJ, Ros A, Sahuquillo J, Gómez ME (2015) PS directory: a scalable multilevel directory cache for CMPs. J Supercomput 71(8):2847–2876. https://doi.org/10.1007/s11227-014-1332-5
Article Google Scholar
Shukla S, Chaudhuri M (2015) Pool directory: efficient coherence tracking with dynamic directory allocation in many-core systems. In: 2015 33rd IEEE International Conference on Computer Design (ICCD), pp 557–564 . https://doi.org/10.1109/ICCD.2015.7357165
Titos-Gil R, Flores A, Fernández-Pascual R, Ros A, Acacio ME (2017) Way-combining directory: an adaptive and scalable low-cost coherence directory. In: Proceedings of the International Conference on Supercomputing. ICS ’17. Association for Computing Machinery, NY, USA. https://doi.org/10.1145/3079079.3079096. https://doi.org/10.1145/3079079.3079096
Bae HJ, Choi L (2018) Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks. J Supercomput 75:425–446
Article Google Scholar
Alisafaee M (2012) Spatiotemporal coherence tracking. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 341–350. https://doi.org/10.1109/MICRO.2012.39
Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 359–370
Tang Y, Qiu Y, Liu Y, Jiao J, Zhang P, Fan Y (2022) Ss-dgd: Scalable short-entry dual-grain coherence directoris. In: 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp 1–3. https://doi.org/10.1109/ICSICT55466.2022.9963419
Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), pp 282–293
Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: [1988] The 15th Annual International Symposium on Computer Architecture. Conference proceedings, pp 280–289. https://doi.org/10.1109/ISCA.1988.5238
Gupta A, Weber W-D, Mowry TC (1990) Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: ICPP
Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 42, pp 423–434. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1669112.1669166. https://doi.org/10.1145/1669112.1669166
Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture. HPCA ’11, pp 169–180. IEEE Computer Society, USA
Sanchez D, Kozyrakis C (2012) Scd: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture. HPCA ’12, pp 1–12. IEEE Computer Society, USA . https://doi.org/10.1109/HPCA.2012.6168950. https://doi.org/10.1109/HPCA.2012.6168950
Sanchez D, Kozyrakis C (2010) The zcache: decoupling ways and associativity. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’43, pp 187–198. IEEE Computer Society, USA. https://doi.org/10.1109/MICRO.2010.20. https://doi.org/10.1109/MICRO.2010.20
Zhao H, Shriraman A, Dwarkadas S (2010) Space: sharing pattern-based directory coherence for multicore scalability. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 135–146
Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) Spatl: honey, i shrunk the coherence directory. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp 33–44. https://doi.org/10.1109/PACT.2011.10
Valls JJ, Gómez ME, Ros A, Sahuquillo J (2017) A directory cache with dynamic private-shared partitioning. In: Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016, 382–391. https://doi.org/10.1109/HiPC.2016.051
Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2013) Increasing the effectiveness of directory caches by avoiding the tracking of noncoherent memory blocks. IEEE Trans Comput 62(3):482–495. https://doi.org/10.1109/TC.2011.241
Article MathSciNet Google Scholar
Fang L, Liu P, Hu Q, Huang MC, Jiang G (2013) Building expressive, area-efficient coherence directories. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp 299–308. https://doi.org/10.1109/PACT.2013.6618826
Martin MMK, Hill MD, Wood DA (2003) Token coherence: Decoupling performance and correctness. In: Conference Proceedings–Annual International sSmposium on Computer Architecture, ISCA, 182–193. https://doi.org/10.1109/isca.2003.1206999
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput. Archit. News 39(2):1–7. https://doi.org/10.1145/2024716.2024718
Article Google Scholar
Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 72–81
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd Annual International Symposium on Computer Architecture, pp 24–36. https://doi.org/10.1109/ISCA.1995.524546
Gebhart M, Hestness J, Fatehi E, Gratz P, Keckler SW (October 2009) Running parsec 2.1 on m5. Technical report TR-09-32, The University of Texas at Austin
PARSEC Group: a memo on exploration of SPLASH-2 Input Sets (June), 1–12 (2011)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by Alibaba Innovative Research (AIR) Program, in part by the Fudan University-CIOMP Joint Fund(FC2019-001), in part by the Fudan-ZTE Joint Lab, in part by Pioneering Project of Academy for Engineering and Technology Fudan University(gyy2021-001), in part by CCF-Alibaba Innovative Research Fund For Young Scholars.

Author information

Authors and Affiliations

State Key Laboratory of ASIC and System Department of Microelectronics, Fudan University, Shanghai, China
Yuxin Tang, Yudi Qiu, Yanwei Liu, Jie Jiao & Yibo Fan
Advanced Institute of Information Technology, Peking University, Beijing, China
Peng Zhang

Authors

Yuxin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yudi Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Fan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y and Y wrote the main manuscript text. All authors helped with the experiments and reviewed the manuscript.

Corresponding author

Correspondence to Yibo Fan.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors

Ethical Approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, Y., Qiu, Y., Liu, Y. et al. Scalable short-entry dual-grain coherence directories with flexible region granularity. J Supercomput 80, 2889–2911 (2024). https://doi.org/10.1007/s11227-023-05559-8

Download citation

Accepted: 03 August 2023
Published: 23 August 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11227-023-05559-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable short-entry dual-grain coherence directories with flexible region granularity

Abstract

Access this article

Similar content being viewed by others

DASC-DIR: a low-overhead coherence directory for many-core processors

Improving multiprocessor performance with fine-grain coherence bypass

Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable short-entry dual-grain coherence directories with flexible region granularity

Abstract

Access this article

Similar content being viewed by others

DASC-DIR: a low-overhead coherence directory for many-core processors

Improving multiprocessor performance with fine-grain coherence bypass

Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation