skip to main content
research-article

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

Authors Info & Claims
Published:13 September 2021Publication History
Skip Abstract Section

Abstract

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by \(\) and runtime by \(\), while requiring \(\) smaller storage and \(\) lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.

References

  1. Niket Agarwal, Li-Shiuan Peh, and Niraj K. Jha. 2009. In-network coherence filtering: Snoopy coherence without broadcasts. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 232–243. https://doi.org/10.1145/1669112.1669143 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Agarwal, L. S. Peh, and N. K. Jha. 2009. In-network snoop ordering (INSO): Snoopy coherence on unordered interconnects. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture. 67–78. https://doi.org/10.1109/HPCA.2009.4798238Google ScholarGoogle Scholar
  3. M. Alisafaee. 2012. Spatiotemporal coherence tracking. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. 341–350. https://doi.org/10.1109/MICRO.2012.39 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Asaduzzaman and K. K. Chidella. 2016. A novel directory-based hybrid cache coherence protocol for shared memory multiprocessors. In Proceedings of the IEEE International Symposium on Phased Array Systems and Technology (PAST). 1–6. https://doi.org/10.1109/ARRAY.2016.7832588Google ScholarGoogle Scholar
  5. N. Beck, S. White, M. Paraschou, and S. Naffziger. 2018. “Zeppelin”: An SoC for multichip architectures. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’18). 40–42. https://doi.org/10.1109/ISSCC.2018.8310173Google ScholarGoogle ScholarCross RefCross Ref
  6. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72–81. https://doi.org/10.1145/1454115.1454128 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paul Caheny, Marc Casas, Miquel Moretó, Hervé Gloaguen, Maxime Saintes, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2016. Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’16). ACM, New York, NY, 275–286. https://doi.org/10.1145/2967938.2967962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Cuesta, A. Ros, M. E. Gomez, A. Robles, and J. Duato. 2011. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). 93–103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. K. Daya, C. H. O. Chen, S. Subramanian, W. C. Kwon, S. Park, T. Krishna, J. Holt, A. P. Chandrakasan, and L. S. Peh. 2014. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 25–36. https://doi.org/10.1109/ISCA.2014.6853232 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. K. Daya, L. S. Peh, and A. P. Chandrakasan. 2017. Low-power on-chip network providing guaranteed services for snoopy coherent and artificial neural network systems. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1–6. https://doi.org/10.1145/3061639.3062278 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Deb, K. Chang, X. Yu, S. P. Sah, M. Cosic, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. 2013. Design of an energy-efficient CMOS-Compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Trans. Comput. 62, 12 (Dec 2013), 2382–2396. https://doi.org/10.1109/TC.2012.224 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, and Trevor Mudge. 2012. XPoint Cache: Scaling existing bus-based coherence protocols for 2D and 3D many-core systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 75–86. https://doi.org/10.1145/2370816.2370829 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. H. Gade, S. Garg, and S. Deb. 2017. OFDM-based high data rate, fading resilient transceiver for wireless networks-on-chip. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). 483–488. https://doi.org/10.1109/ISVLSI.2017.90Google ScholarGoogle Scholar
  14. Sri Harsha Gade, Shobha Sundar Ram, and Sujay Deb. 2019. Millimeter wave wireless interconnects in deep submicron chips: Challenges and opportunities. Integration 64 (2019), 127–136. https://doi.org/10.1016/j.vlsi.2018.09.004Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Garcia-Guirado, R. Fernandez-Pascual, and J. M. Garcia. 2015. ICCI: In-cache coherence information. IEEE Trans. Comput. 64, 4 (Apr. 2015), 995–1014. https://doi.org/10.1109/TC.2014.2308185Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joel Hruska. 2018. Intel Uses New Foveros 3D Chip-Stacking to Build Core, Atom on Same Silicon. ExtremeTech. Retrieved from https://www.extremetech.com/computing/282137-intel-uses-new-foveros-3d-chip-stacking-technology-to-build-core-atom-on-the-same-silicon.Google ScholarGoogle Scholar
  18. Libo Huang, Zhiying Wang, Nong Xiao, Yongwen Wang, and Qiang Dou. 2014. Integrated coherence prediction: Towards efficient cache coherence on NoC-based multicore architectures. ACM Trans. Des. Autom. Electron. Syst. 19, 3, Article 24 (June 2014), 22 pages. https://doi.org/10.1145/2611756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kaushik, M. Agrawal, H. K. Mondal, S. H. Gade, and S. Deb. 2017. Path loss-aware adaptive transmission power control scheme for energy-efficient wireless NoC. In Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS’17). 132–135.Google ScholarGoogle Scholar
  20. Abdullah Kayi and Tarek El-Ghazawi. 2010. An adaptive cache coherence protocol for chip multiprocessors. In Proceedings of the 2nd International Forum on Next-Generation Multicore/Manycore Technologies (IFMT’10). ACM, New York, NY, Article 4, 10 pages. https://doi.org/10.1145/1882453.1882458 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kayi, O. Serres, and T. El-Ghazawi. 2015. Adaptive cache coherence mechanisms with producer-consumer sharing optimization for chip multiprocessors. IEEE Trans. Comput. 64, 2 (Feb. 2015), 316–328. https://doi.org/10.1109/TC.2013.217Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 477–488. https://doi.org/10.1145/1854273.1854332 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469–480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Mahajan, R. Sankman, N. Patel, D. Kim, K. Aygun, Z. Qian, Y. Mekonnen, I. Salama, S. Sharan, D. Iyengar, and D. Mallik. 2016. Embedded multi-die interconnect bridge (EMIB)—A high-density, high-bandwidth packaging interconnect. In Proceedings of the IEEE 66th Electronic Components and Technology Conference (ECTC’16). 557–565. https://doi.org/10.1109/ECTC.2016.201Google ScholarGoogle Scholar
  25. Ofer Markish, Oded Katz, Benny Sheinman, Dan Corcos, and Danny Elad. 2015. On-chip millimeter wave antennas and transceivers. In Proceedings of the 9th International Symposium on Networks-on-Chip (NOCS’15). ACM, New York, NY, Article 11, 7 pages. https://doi.org/10.1145/2786572.2789983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Milo M. K. Martin, Mark D. Hill, and Daniel J. Sorin. 2012. Why On-chip cache coherence is here to stay. Commun. ACM 55, 7 (July 2012), 78–89. https://doi.org/10.1145/2209249.2209269 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. M. K. Martin, M. D. Hill, and D. A. Wood. 2003. Token coherence: decoupling performance and correctness. In Proceedings of the 30th Annual International Symposium on Computer Architecture. 182–193. https://doi.org/10.1109/ISCA.2003.1206999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Norman P. Jouppi Naveen Muralimanohar, and Rajeev Balasubramonian. 2009. CACTI 6.0: A Tool to Model Large Caches. Retrieved from https://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.Google ScholarGoogle Scholar
  29. Yin-Chi Peng, Chien-Chih Chen, Hsiang-Jen Tsai, Keng-Hao Yang, Pei-Zhe Huang, Shih-Chieh Chang, Wen-Ben Jone, and Tien-Fu Chen. 2017. Leak Stopper: An actively revitalized snoop filter architecture with effective generation control. ACM Trans. Des. Autom. Electron. Syst. 22, 3, Article 46 (Mar. 2017), 27 pages. https://doi.org/10.1145/3015770 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Ros, M. E. Acacio, and J. M. Garcia. 2010. A direct coherence protocol for many-core chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 21, 12 (Dec. 2010), 1779–1792. https://doi.org/10.1109/TPDS.2010.43 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Ros, M. Davari, and S. Kaxiras. 2015. Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies. In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 186–197. https://doi.org/10.1109/HPCA.2015.7056032Google ScholarGoogle Scholar
  32. A. Ros and A. Jimborean. 2016. A hybrid static-dynamic classification for dual-consistency cache coherence. IEEE Trans. Parallel Distrib. Syst. 27, 11 (Nov. 2016), 3101–3115. https://doi.org/10.1109/TPDS.2016.2528241 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Sanchez and C. Kozyrakis. 2012. SCD: A scalable coherence directory with flexible sharer set encoding. In Proceedings of the IEEE International Symposium on High-Performance Comp Architecture. 1–12. https://doi.org/10.1109/HPCA.2012.6168950 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. David Schor. 2018. AMD Announces Threadripper 2, Chiplets Aid Core Scaling. WikiChip. Retrieved from https://fuse.wikichip.org/news/1569/amd-announces-threadripper-2-chiplets-aid-core-scaling/.Google ScholarGoogle Scholar
  35. T. Shreedhar and S. Deb. 2016. Hierarchical cluster-based NOC design using wireless interconnects for coherence support. In Proceedings of the 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID’16). 63–68. https://doi.org/10.1109/VLSID.2016.54 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. 2016. Knights landing: Second-generation intel xeon phi product. IEEE Micro 36, 2 (Mar. 2016), 34–46. https://doi.org/10.1109/MM.2016.25 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Strauss, X. Shen, and J. Torrellas. 2007. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’07). 327–342. https://doi.org/10.1109/MICRO.2007.37 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 335–344. https://doi.org/10.1145/2370816.2370865 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Volos, C. Seiculescu, B. Grot, N. K. Pour, B. Falsafi, and G. De Micheli. 2012. CCNoC: Specializing on-chip interconnects for energy efficiency in cache-coherent servers. In Proceedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS’12). 67–74. https://doi.org/10.1109/NOCS.2012.15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Zebchuk, B. Falsafi, and A. Moshovos. 2013. Multi-grain coherence directories. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 359–370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Zebchuk, M. K. Qureshi, V. Srinivasan, and A. Moshovos. 2009. A tagless coherence directory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 423–434. https://doi.org/10.1145/1669112.1669166 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Zhao, O. Jang, W. Ding, Y. Zhang, M. Kandemir, and M. J. Irwin. 2012. A hybrid NoC design for cache coherence optimization for chip multiprocessors. In Proceedings of the DAC Design Automation Conference. 834–842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hongzhou Zhao, Arrvindh Shriraman, and Sandhya Dwarkadas. 2010. SPACE: Sharing pattern-based directory coherence for multicore scalability. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 135–146. https://doi.org/10.1145/1854273.1854294 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan. 2011. SPATL: Honey, I shrunk the coherence directory. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). 33–44. https://doi.org/10.1109/PACT.2011.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xiangrong Zhou, Chenjie Yu, Alokika Dash, and Peter Petrov. 2008. Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors. ACM Trans. Des. Autom. Electron. Syst. 13, 1, Article 16 (Feb. 2008), 25 pages. https://doi.org/10.1145/1297666.1297682 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 1
        January 2022
        230 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/3483335
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 September 2021
        • Accepted: 1 April 2021
        • Revised: 1 March 2021
        • Received: 1 May 2020
        Published in todaes Volume 27, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)102
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format