skip to main content
10.1145/3479876.3481590acmconferencesArticle/Chapter ViewAbstractPublication PagesnocsConference Proceedingsconference-collections
short-paper

DUB: dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads

Published:08 October 2021Publication History

ABSTRACT

The performance of graphics processing units (GPU) workloads can be sensitive to the various clock domains which are dynamically tunable in modern GPUs. In this work, we observe that GPU application performance is sensitive towards NoC clock frequencies and the sensitivity varies during the execution of GPU kernels. We note that this heterogeneity is not adapted well by traditional dynamic voltage frequency scaling (DVFS) techniques. To that end, we introduce DUB, <u>D</u>ynamic <u>U</u>nderclocking and <u>B</u>ypassing technique, for such heterogeneous GPU workloads. We enable bypassing re-timer flops and routers while underclocking the NoC frequency thus enabling high power savings at minimal performance loss. Compared to baseline we observe a 26% improvement in power savings with only 3% degradation in performance beating oracular DVFS techniques.

References

  1. Johnathan Alsop et al. 2019. Optimizing GPU cache policies for MI workloads. In 2019 IISWC. IEEE, 243--248.Google ScholarGoogle Scholar
  2. AMD. 2017. Radeon's next-generation Vega architecture. https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdfGoogle ScholarGoogle Scholar
  3. Srikant Bharadwaj et al. 2018. Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects. In 2018 International Symposium on Microarchitecture. 271--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Srikant Bharadwaj et al. 2020. Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling. In DAC 2020 (USA) (DAC '20). IEEE Press, Article 144, 6 pages.Google ScholarGoogle Scholar
  5. Nathan Binkert et al. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xi Chen et al. 2012. In-network Monitoring and Control Policy for DVFS of CMP Networks-on-Chip and Last Level Caches. In 2012 NOCS. 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jack Choquette et al. 2021. NVIDIA A100 Tensor Core GPU: Performance and Innovation. IEEE Micro (2021).Google ScholarGoogle Scholar
  8. A. Gutierrez et al. 2018. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In HPCA. Google ScholarGoogle ScholarCross RefCross Ref
  9. Fettes et al. 2019. Dynamic Voltage and Frequency Scaling in NoCs with Supervised and Reinforcement Learning Techniques. IEEE Trans. Comput. 68, 3 (2019), 375--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jason Lowe-Power et al. 2020. The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs.AR]Google ScholarGoogle Scholar
  11. Robert Hesse and Natalie Enright Jerger. 2015. Improving DVFS in NoCs with coherence prediction. In Proceedings of the 9th International Symposium on Networks-on-Chip. 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Natalie D. Enright Jerger, Tushar Krishna, and Li-Shiuan Peh. 2017. On-Chip Networks, Second Edition. Morgan & Claypool Publishers.Google ScholarGoogle Scholar
  13. M. Kar and T. Krishna. 2017. A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 743--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Krishna et al. 2013. Breaking the on-chip latency barrier using SMART In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 378--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tao Li and Greg Sadowski. 2014. Design and implementation of novel source synchronous interconnection in modern GPU chips. In 2014 27th IEEE International System-on-Chip Conference (SOCC). IEEE, 130--135.Google ScholarGoogle ScholarCross RefCross Ref
  16. Samuel Naffziger et al. 2021. Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families : Industrial Product. In ACM/IEEE ISCA. 57--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yuan Yao and Zhonghai Lu. 2016. Memory-access aware dvfs for network-on-chip in cmps. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1433--1436.Google ScholarGoogle Scholar
  18. Xianwei Zhang and Evgeny Shcherbakov. 2020. DELTA: Validate GPU Memory Profiling with Microbenchmarks. Association for Computing Machinery, New York, NY, USA, 97--104. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DUB: dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      NOCS '21: Proceedings of the 15th IEEE/ACM International Symposium on Networks-on-Chip
      October 2021
      91 pages
      ISBN:9781450390835
      DOI:10.1145/3479876

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate14of44submissions,32%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader