skip to main content
10.1145/3649329.3663517acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Open access

Invited: New Solutions on LLM Acceleration, Optimization, and Application

Published: 07 November 2024 Publication History

Abstract

Large Language Models (LLMs) have revolutionized a wide range of applications with their strong human-like understanding and creativity. Due to the continuously growing model size and complexity, LLM training and deployment have shown significant challenges, which often results in extremely high computational and storage costs and energy consumption. In this paper, we discuss the recent advancements and research directions on (1) LLM algorithm-level acceleration, (2) LLM-hardware co-design for improved system efficiency, (3) LLM-to-accelerator compilation for customized LLM accelerators, and (4) LLM-aided design for HLS (High-Level Synthesis) functional verification. For each aspect, we present the background study, our proposed solutions, and future directions. An extended version of this work can be found at: https://arxiv.org/abs/2406.10903.

References

[1]
Tianle Cai et al. 2024. Medusa: Simple llm inference acceleration framework with multiple decoding heads. arXiv:2401.10774.
[2]
Charlie Chen et al. 2023. Accelerating large language model decoding with speculative sampling. arXiv:2302.01318.
[3]
Deming Chen et al. 2005. xPilot: A Platform-Based Behavioral Synthesis System. In SRC Techcon.
[4]
Jason Cong et al. 2022. FPGA HLS Today: Successes, Challenges, and Opportunities. In TRETS.
[5]
Tri Dao et al. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems.
[6]
Tim Dettmers et al. 2022. Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:2208.07339.
[7]
Yichao Fu et al. 2024. Break the sequential dependency of llm inference using lookahead decoding. arXiv:2402.02057.
[8]
Cong Hao et al. 2018. Deep neural network model and FPGA accelerator co-design: Opportunities and challenges. In ICSICT.
[9]
Cong Hao et al. 2019. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In DAC.
[10]
Seongmin Hong et al. 2022. DFX: A low-latency multi-FPGA appliance for accelerating transformer-based text generation. In MICRO.
[11]
Hyegang Jun et al. 2023. AutoScaleDSE: A scalable design space exploration engine for high-level synthesis. In TRETS.
[12]
Jordan Juravsky et al. 2024. Hydragen: High-Throughput LLM Inference with Shared Prefixes. arXiv:2402.05099.
[13]
Rahul Kande et al. 2023. Llm-assisted generation of hardware assertions. arXiv:2306.14027.
[14]
Achintya Kundu et al. 2024. Efficiently Distilling LLMs for Edge Applications. arXiv preprint arXiv:2404.01353.
[15]
Woosuk Kwon et al. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. In SOSP.
[16]
Yaniv Leviathan et al. 2023. Fast inference from transformers via speculative decoding. In ICML.
[17]
Patrick Lewis et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems.
[18]
Mingjie Liu et al. 2023. Chipnemo: Domain-adapted llms for chip design. arXiv:2311.00176.
[19]
James M Ortega and Werner C Rheinboldt. 2000. Iterative solution of nonlinear equations in several variables. Classics in Applied Mathematics.
[20]
Andrea Santilli et al. 2023. Accelerating transformer inference for translation via parallel decoding. arXiv:2305.10427.
[21]
Mitchell Stern et al. 2018. Blockwise parallel decoding for deep autoregressive models. Advances in Neural Information Processing Systems.
[22]
Shailja Thakur et al. 2023. Autochip: Automating hdl generation using llm feedback. arXiv:2311.04887.
[23]
Shailja Thakur et al. 2023. Verigen: A large language model for verilog code generation. In TRETS.
[24]
YunDa Tsai et al. 2023. Rtlfixer: Automatically fixing rtl syntax errors with large language models. arXiv:2311.16543.
[25]
Lily Jiaxin Wan et al. 2024. Software/Hardware Co-design for LLM and Its Application for Design Verification. In ASP-DAC.
[26]
Jason Wei et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems.
[27]
Guangxuan Xiao et al. 2023. Smoothquant: Accurate and efficient post-training quantization for large language models. In ICML.
[28]
Hanchen Ye et al. 2022. ScaleHLS: A new scalable high-level synthesis framework on multi-level intermediate representation. In HPCA.
[29]
Hanchen Ye et al. 2022. ScaleHLS: a scalable high-level synthesis framework with multi-level transformations and optimizations. In DAC.
[30]
Hanchen Ye et al. 2023. High-level Synthesis for Domain Specific Computing. In ISPD.
[31]
Hanchen Ye et al. 2024. HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis. In ASPLOS.
[32]
Haoran You et al. 2023. Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design. In HPCA.
[33]
Shulin Zeng et al. 2024. FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGA. arXiv:2401.03868.
[34]
Xiaofan Zhang et al. 2018. DNNBuilder: An automated tool for building highperformance DNN hardware accelerators for FPGAs. In ICCAD.
[35]
Xiaofan Zhang et al. 2022. AutoDistill: An end-to-end framework to explore and distill hardware-efficient language models. arXiv:2201.08539.
[36]
Zhenyu Zhang et al. 2023. H2o: Heavy-hitter oracle for efficient generative inference of large language models. arXiv:2306.14048.
[37]
Ruizhe Zhong et al. 2023. LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation. arXiv:2401.12224.

Cited By

View all
  • (2025)Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language ModelsInformation10.3390/info1602008716:2(87)Online publication date: 23-Jan-2025
  • (2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
  • (2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
June 2024
2159 pages
ISBN:9798400706011
DOI:10.1145/3649329
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

  1. large language models (LLMs)
  2. high-level synthesis (HLS)
  3. acceleration
  4. functional verification
  5. hardware design

Qualifiers

  • Research-article

Funding Sources

  • IBM-Illinois Discovery Accelerator Institute,
  • AMD Center of Excellence at UIUC
  • AMD Heterogeneous Adaptive Compute Cluster (HACC) initiative
  • NSF (National Science Foundation)
  • Semiconductor Research Corporation (SRC)

Conference

DAC '24
Sponsor:
DAC '24: 61st ACM/IEEE Design Automation Conference
June 23 - 27, 2024
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,234
  • Downloads (Last 6 weeks)458
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language ModelsInformation10.3390/info1602008716:2(87)Online publication date: 23-Jan-2025
  • (2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
  • (2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024
  • (2024)Smart Technical Support System Development Using Knowledge Map-Aided Approach2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)10.1109/SUMMA64428.2024.10803829(455-460)Online publication date: 13-Nov-2024
  • (2024)Comparative Analysis of Fine-Tuned LLM, BERT and DL Models for Customer Sentiment Analysis2024 13th International Conference on System Modeling & Advancement in Research Trends (SMART)10.1109/SMART63812.2024.10882546(255-259)Online publication date: 6-Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media