skip to main content
10.1145/3620665.3640364acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Carat: Unlocking Value-Level Parallelism for Multiplier-Free GEMMs

Published:27 April 2024Publication History

ABSTRACT

In recent years, hardware architectures optimized for general matrix multiplication (GEMM) have been well studied to deliver better performance and efficiency for deep neural networks. With trends towards batched, low-precision data, e.g., FP8 format in this work, we observe that there is growing untapped potential for value reuse. We propose a novel computing paradigm, value-level parallelism, whereby unique products are computed only once, and different inputs subscribe to (select) their products via temporal coding. Our architecture, Carat, employs value-level parallelism and transforms multiplication into accumulation, performing GEMMs with efficient multiplier-free hardware. Experiments show that, on average, Carat improves iso-area throughput and energy efficiency by 1.02× and 1.06× over a systolic array and 3.2× and 4.3× when scaled up to multiple nodes.

References

  1. Shakeel Ahmad, Muhammad Zubair Asghar, Fahad Mazaed Alotaibi, and Yasser D Al-Otaibi. A Hybrid CNN+BILSTM Deep Learning-Based DSS for Efficient Prediction of Judicial Case Decisions. Expert Systems with Applications, 209:118318, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ryo Akita, Akira Yoshihara, Takashi Matsubara, and Kuniaki Uehara. Deep Learning for Stock Prediction Using Numerical and Textual Information. In International Conference on Computer and Information Science, 2016.Google ScholarGoogle Scholar
  3. Arm. Arm supports FP8: A new 8-bit floating-point interchange format for Neural Network processing. Online, Sep 2022.Google ScholarGoogle Scholar
  4. Mir Mohammad Azad, Apoorva Ganapathy, Siddhartha Vadlamudi, and Harish Paruchuri. Medical Diagnosis Using Deep Learning Techniques: A Research Survey. Annals of the Romanian Society for Cell Biology, 25(6):5591--5600, 2021.Google ScholarGoogle Scholar
  5. Mihalj Bakator and Dragica Radosav. Deep Learning and Medical Diagnosis: A Review of Literature. Multimodal Technologies and Interaction, 2(3):47, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  6. Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. Transactions on Architecture and Code Optimization, 14(2), Jun 2017.Google ScholarGoogle Scholar
  7. Chase. How Often is Your Credit Score Updated? Online, Sep 2023.Google ScholarGoogle Scholar
  8. Baogui Chen, Yu Li, Shu Zhang, Hao Lian, and Tieke He. A Deep Learning Method for Judicial Decision Support. In International Conference on Software Quality, Reliability and Security Companion, pages 145--149, 2019.Google ScholarGoogle Scholar
  9. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In International Symposium on Computer Architecture, 2016.Google ScholarGoogle Scholar
  10. Yujeong Choi, Yunseong Kim, and Minsoo Rhu. Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference. In International Symposium on High-Performance Computer Architecture, 2021.Google ScholarGoogle Scholar
  11. Iulia M. Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, and Jyrki Alakuijala. Temporal Coding in Spiking Neural Networks with Alpha Synaptic Function. In International Conference on Acoustics, Speech and Signal Processing, 2020.Google ScholarGoogle Scholar
  12. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. Clipper: A Low-Latency Online Prediction Serving System. In Symposium on Networked Systems Design and Implementation, 2017.Google ScholarGoogle Scholar
  13. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. Large Scale Distributed Deep Networks. In International Conference on Neural Information Processing Systems, 2012.Google ScholarGoogle Scholar
  14. Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2019.Google ScholarGoogle Scholar
  15. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Conference on Computer Vision and Pattern Recognition, 2009.Google ScholarGoogle Scholar
  16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina" Toutanova. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.Google ScholarGoogle Scholar
  17. S. Rasoul Faraji and Kia Bazargan. Hybrid Binary-Unary Hardware Accelerator. Transactions on Computers, 69(9):1308--1319, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  18. Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, and Kailash Gopalakrishnan. Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization. arXiv, 2022.Google ScholarGoogle Scholar
  19. Adi Fuchs and David Wentzlaff. Scaling Datacenter Accelerators with Compute-Reuse Architectures. In International Symposium on Computer Architecture, 2018.Google ScholarGoogle Scholar
  20. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pin Gao, Lingfan Yu, Yongwei Wu, and Jinyang Li. Low Latency RNN Inference with Cellular Batching. In EuroSys Conference, 2018.Google ScholarGoogle Scholar
  22. Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin, Niharika Maheshwari, Evarist Fomenko, and Eden Segal. Highly Efficient 8-Bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe. In Reproducible Quality-Efficient Systems Tournament on Co-Designing Pareto-Efficient Deep Learning, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, and George Michelogiannakis. Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2022.Google ScholarGoogle Scholar
  24. Google. System Architecture. Online, Nov 2022.Google ScholarGoogle Scholar
  25. Google. Edge TPU Compiler. Online, Apr 2023.Google ScholarGoogle Scholar
  26. Björn Rafn Gunnarsson, Seppe Vanden Broucke, Bart Baesens, María Óskarsdóttir, and Wilfried Lemahieu. Deep Learning for Credit Scoring: Do or Don't? European Journal of Operational Research, 295(1):292--305, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarGoogle Scholar
  28. Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, and Alexander Gruenstein. Streaming End-to-end Speech Recognition for Mobile Devices. In International Conference on Acoustics, Speech and Signal Processing, 2019.Google ScholarGoogle Scholar
  29. Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher W. Fletcher. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition. In International Symposium on Computer Architecture, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, and Yakun Sophia Shao. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In International Symposium on Computer Architecture, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Intel. Neural Network Distiller. Online, Oct 2019.Google ScholarGoogle Scholar
  32. Intel. Cross-Industry Hardware Specification to Accelerate AI Software Development. Online, Seq 2022.Google ScholarGoogle Scholar
  33. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-Datacenter Performance Analysis of A Tensor Processing Unit. In International Symposium on Computer Architecture, 2017.Google ScholarGoogle Scholar
  34. Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et al. A Study of BFLOAT16 For Deep Learning Training. arXiv, 2019.Google ScholarGoogle Scholar
  35. Yunseong Kim, Yujeong Choi, and Minsoo Rhu. PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers. arXiv, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jack Kosaian, Amar Phanishayee, Matthai Philipose, Debadeepta Dey, and Rashmi Vinayak. Boosting The Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size. In International Conference on Machine Learning, 2021.Google ScholarGoogle Scholar
  37. Kankawin Kowsrihawat, Peerapon Vateekul, and Prachya Boonkwan. Predicting Judicial Decisions of Criminal Cases from Thai Supreme Court Using Bi-directional GRU with Attention Mechanism. In Asian Conference on Defense Technology, pages 50--55, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  38. Eli Kravchik, Fan Yang, Pavel Kisilev, and Yoni Choukroun. Low-bit Quantization of Neural Networks for Efficient Inference. In International Conference on Computer Vision Workshops, 2019.Google ScholarGoogle Scholar
  39. Andrey Kuzmin, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, and Tijmen Blankevoort. FP8 Quantization: The Power of the Exponent. In Advances in Neural Information Processing Systems, 2022.Google ScholarGoogle Scholar
  40. Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In International Symposium on Microarchitecture, 2019.Google ScholarGoogle Scholar
  41. Raymond S. T. Lee. Chaotic Type-2 Transient-Fuzzy Deep Neuro-Oscillatory Network (CT2TFDNN) for Worldwide Financial Prediction. Transactions on Fuzzy Systems, 28(4):731--745, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  42. Sunwoo Lee, Qiao Kang, Sandeep Madireddy, Prasanna Balaprakash, Ankit Agrawal, Alok Choudhary, Richard Archibald, and Wei-keng Liao. Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time. In International Conference on Big Data, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  43. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single Shot Multi-box Detector. In European Conference on Computer Vision, 2016.Google ScholarGoogle Scholar
  44. Hang Lu, Liang Chang, Chenglong Li, Zixuan Zhu, Shengjian Lu, Yanhuan Liu, and Mingzhe Zhang. Distilling Bit-Level Sparsity Parallelism for General Purpose Deep Learning Acceleration. In International Symposium on Microarchitecture, 2021.Google ScholarGoogle Scholar
  45. Siyuan Ma and Mikhail Belkin. Kernel Machines That Adapt To GPUs for Effective Large Batch Training. In Machine Learning and Systems, 2019.Google ScholarGoogle Scholar
  46. A. Madhavan, T. Sherwood, and D. Strukov. Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms. In International Symposium on Computer Architecture, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. MLPerf Training Benchmark. In Machine Learning and Systems, 2020.Google ScholarGoogle Scholar
  48. Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Heinecke, Patrick Judd, John Kamalu, et al. FP8 Formats for Deep Learning. arXiv, 2022.Google ScholarGoogle Scholar
  49. Harideep Nair, John Paul Shen, and James E Smith. A Microarchitecture Implementation Framework for Online Learning with Temporal Neural Networks. In Computer Society Annual Symposium on VLSI, 2021.Google ScholarGoogle Scholar
  50. Vinod Nair and Geoffrey E Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on International Conference on Machine Learning, 2010.Google ScholarGoogle Scholar
  51. M. Hassan Najafi, David. J. Lilja, MarcD. Riedel, and Kia Bazargan. Low-Cost Sorting Network Circuits Using Unary Processing. Transactions on Very Large Scale Integration Systems, 26(8):1471--1480, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  52. Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems. arXiv, 2019.Google ScholarGoogle Scholar
  53. Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, and Carlo Luschi. 8-bit Numerical Formats for Deep Neural Networks. arXiv, 2022.Google ScholarGoogle Scholar
  54. NVIDIA. NVIDIA, Arm, and Intel Publish FP8 Specification for Standardization as an Interchange Format for AI. Online, Sep 2022.Google ScholarGoogle Scholar
  55. Mike O'Connor, Niladrish Chatterjee, Donghyuk Lee, John Wilson, Aditya Agrawal, Stephen W. Keckler, and William J. Dally. FineGrained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems. In International Symposium on Microarchitecture, 2017.Google ScholarGoogle Scholar
  56. Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, and Jian Sun. MegDet: A Large Mini-Batch Object Detector. In Conference on Computer Vision and Pattern Recognition, 2018.Google ScholarGoogle Scholar
  57. Becky Pokora. Credit Card Statistics And Trends 2023. Online, Mar 2023.Google ScholarGoogle Scholar
  58. Marc Riera, Jose-Maria Arnau, and Antonio Gonzalez. Computation Reuse in DNNs by Exploiting Input Similarity. In International Symposium on Computer Architecture, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical image computing and computer-assisted intervention, 2015.Google ScholarGoogle Scholar
  60. Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In International Symposium on Microarchitecture, 2019.Google ScholarGoogle Scholar
  61. Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. In International Symposium on Computer Architecture, 2014.Google ScholarGoogle Scholar
  62. Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep Learning in Medical Image Analysis. Annual review of biomedical engineering, 19:221--248, 2017.Google ScholarGoogle Scholar
  63. Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis. In Symposium on Operating Systems Principles, 2019.Google ScholarGoogle Scholar
  64. Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E Gonzalez, et al. FlexGen: High-Throughput Generative Inference of Large Language Models with A Single GPU. International Conference on Machine Learning, 2023.Google ScholarGoogle Scholar
  65. Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Viji Srinivasan, Xiaodong Cui, Wei Zhang, and Kailash Gopalakrishnan. Hybrid 8-Bit Floating Point (HFP8) Training and Inference for Deep Neural Networks. Advances in neural information processing systems, 32, 2019.Google ScholarGoogle Scholar
  66. Georgios Tzimpragos, Advait Madhavan, Dilip Vasudevan, Dmitri Strukov, and Timothy Sherwood. Boosted Race Trees for Low Energy Classification. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2019.Google ScholarGoogle Scholar
  67. Georgios Tzimpragos, Jennifer Volk, Alex Wynn, James E. Smith, and Timothy Sherwood. Superconducting Computing with Alternating Logic Elements. In International Symposium on Computer Architecture, 2021.Google ScholarGoogle Scholar
  68. DiWu, Jingjie Li, Zhewen Pan, Younghyun Kim, and Joshua San Miguel. uBrain: A Unary Brain Computer Interface. In International Symposium on Computer Architecture, 2022.Google ScholarGoogle Scholar
  69. Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel. uGEMM: Unary Computing Architecture for GEMM Applications. In International Symposium on Computer Architecture, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Di Wu and Joshua San Miguel. uSystolic: Byte-Crawling Unary Systolic Array. In International Symposium on High-Performance Computer Architecture, 2022.Google ScholarGoogle Scholar
  71. Di Wu and Joshua San Miguel. Special Session: When Dataflows Converge: Reconfigurable and Approximate Computing for Emerging Neural Networks. In International Conference on Computer Design, 2021.Google ScholarGoogle Scholar
  72. Hao Wu. Low precision Inference on GPU. Online, Mar 2019.Google ScholarGoogle Scholar
  73. Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation. arXiv, 2020.Google ScholarGoogle Scholar
  74. Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In International Conference on Computer-Aided Design, 2019.Google ScholarGoogle Scholar
  75. Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, and Kurt Keutzer. HAWQ-V3: Dyadic Neural Network Quantization. In International Conference on Machine Learning, 2021.Google ScholarGoogle Scholar
  76. Sungyeob Yoo, Hyunsung Kim, Jinseok Kim, Sunghyun Park, Joo-Young Kim, and Jinwook Oh. LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler. In International Symposium on High-Performance Computer Architecture, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  77. Yaqi Zhang, Alexander Rucker, Matthew Vilim, Raghu Prabhakar, William Hwang, and Kunle Olukotun. Scalable interconnects for reconfigurable spatial architectures. In Proceedings of the 46th International Symposium on Computer Architecture, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Carat: Unlocking Value-Level Parallelism for Multiplier-Free GEMMs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Article Metrics

              • Downloads (Last 12 months)171
              • Downloads (Last 6 weeks)171

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader