skip to main content
10.1145/3583781.3590318acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
abstract

A Scalable BFloat16 Dot-Product Architecture for Deep Learning

Authors Info & Claims
Published:05 June 2023Publication History

ABSTRACT

BFloat16(BF16) format has recently driven the development of deep learning due to its higher energy efficiency and less memory consumption than the traditional format. This paper presents a scalable BF16 dot-product(DoP) architecture for high-performance deep-learning computing. A novel 4-term DoP unit is proposed as a fundamental module in the architecture, which performs 4-term DoP operation in three cycles. More-term DoP units are constructed through the extension of the fundamental unit, in which early exponent comparison is performed to hide latency, and intermediate normalization and rounding are omitted to improve accuracy and further reduce latency. Compared with the discrete design, the proposed architecture reduces latency by 22.8% for 4-term DoP, and a larger proportion of latency is reduced as the size of the DoP operation increases. Compared with existing designs for BF16, the proposed architecture at 64-term exhibits better-normalized energy efficiency and higher throughput with at least 1.88× and 20.3× improvement, respectively.

References

  1. John Osorio and et al. A BF16 FMA is all you need for DNN training. IEEE Transactions on Emerging Topics in Computing, 10:1302--1314, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dhiraj D. Kalamkar and et al. A study of bfloat16 for deep learning training. ArXiv, abs/1905.12322, 2019.Google ScholarGoogle Scholar
  3. Neil Burgess and et al. Bfloat16 processing for neural networks. 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), pages 88--91, 2019.Google ScholarGoogle Scholar
  4. Stefan Mach and et al. Fpnew: An open-source multiformat floating-point unit architecture for energy-proportional transprecision computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29:774--787, 2020.Google ScholarGoogle Scholar
  5. Luca Bertaccini and et al. MiniFloat-NN and ExSdotp: An ISA extension and a modular open hardware unit for low-precision training on RISC-V cores. 2022 IEEE 29th Symposium on Computer Arithmetic (ARITH), pages 1--8, 2022.Google ScholarGoogle Scholar
  6. Hongbing Tan and et al. Multiple-mode-supporting floating-point FMA unit for deep learning processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 31:253--266, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jongwook Sohn and et al. A fused floating-point four-term dot product unit. IEEE Transactions on Circuits and Systems I: Regular Papers, 63:370--378, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. Robert H. Dennard and et al. Design of ion-implanted MOSFET's with very small physical dimensions. Proceedings of the IEEE, 87:668--678, 1974.Google ScholarGoogle Scholar

Index Terms

  1. A Scalable BFloat16 Dot-Product Architecture for Deep Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023
        June 2023
        731 pages
        ISBN:9798400701252
        DOI:10.1145/3583781

        Copyright © 2023 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 June 2023

        Check for updates

        Qualifiers

        • abstract

        Acceptance Rates

        Overall Acceptance Rate312of1,156submissions,27%

        Upcoming Conference

        GLSVLSI '24
        Great Lakes Symposium on VLSI 2024
        June 12 - 14, 2024
        Clearwater , FL , USA
      • Article Metrics

        • Downloads (Last 12 months)87
        • Downloads (Last 6 weeks)4

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader