research-article

Open access

Invited: New Solutions on LLM Acceleration, Optimization, and Application

Authors:

Yingbing Huang,

Lily Jiaxin Wan,

Deming ChenAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 369, Pages 1 - 4

https://doi.org/10.1145/3649329.3663517

Published: 07 November 2024 Publication History

Abstract

Large Language Models (LLMs) have revolutionized a wide range of applications with their strong human-like understanding and creativity. Due to the continuously growing model size and complexity, LLM training and deployment have shown significant challenges, which often results in extremely high computational and storage costs and energy consumption. In this paper, we discuss the recent advancements and research directions on (1) LLM algorithm-level acceleration, (2) LLM-hardware co-design for improved system efficiency, (3) LLM-to-accelerator compilation for customized LLM accelerators, and (4) LLM-aided design for HLS (High-Level Synthesis) functional verification. For each aspect, we present the background study, our proposed solutions, and future directions. An extended version of this work can be found at: https://arxiv.org/abs/2406.10903.

References

[1]

Tianle Cai et al. 2024. Medusa: Simple llm inference acceleration framework with multiple decoding heads. arXiv:2401.10774.

[2]

Charlie Chen et al. 2023. Accelerating large language model decoding with speculative sampling. arXiv:2302.01318.

[3]

Deming Chen et al. 2005. xPilot: A Platform-Based Behavioral Synthesis System. In SRC Techcon.

[4]

Jason Cong et al. 2022. FPGA HLS Today: Successes, Challenges, and Opportunities. In TRETS.

[5]

Tri Dao et al. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems.

[6]

Tim Dettmers et al. 2022. Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:2208.07339.

[7]

Yichao Fu et al. 2024. Break the sequential dependency of llm inference using lookahead decoding. arXiv:2402.02057.

[8]

Cong Hao et al. 2018. Deep neural network model and FPGA accelerator co-design: Opportunities and challenges. In ICSICT.

[9]

Cong Hao et al. 2019. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In DAC.

[10]

Seongmin Hong et al. 2022. DFX: A low-latency multi-FPGA appliance for accelerating transformer-based text generation. In MICRO.

[11]

Hyegang Jun et al. 2023. AutoScaleDSE: A scalable design space exploration engine for high-level synthesis. In TRETS.

[12]

Jordan Juravsky et al. 2024. Hydragen: High-Throughput LLM Inference with Shared Prefixes. arXiv:2402.05099.

[13]

Rahul Kande et al. 2023. Llm-assisted generation of hardware assertions. arXiv:2306.14027.

[14]

Achintya Kundu et al. 2024. Efficiently Distilling LLMs for Edge Applications. arXiv preprint arXiv:2404.01353.

[15]

Woosuk Kwon et al. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. In SOSP.

[16]

Yaniv Leviathan et al. 2023. Fast inference from transformers via speculative decoding. In ICML.

[17]

Patrick Lewis et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems.

[18]

Mingjie Liu et al. 2023. Chipnemo: Domain-adapted llms for chip design. arXiv:2311.00176.

[19]

James M Ortega and Werner C Rheinboldt. 2000. Iterative solution of nonlinear equations in several variables. Classics in Applied Mathematics.

[20]

Andrea Santilli et al. 2023. Accelerating transformer inference for translation via parallel decoding. arXiv:2305.10427.

[21]

Mitchell Stern et al. 2018. Blockwise parallel decoding for deep autoregressive models. Advances in Neural Information Processing Systems.

[22]

Shailja Thakur et al. 2023. Autochip: Automating hdl generation using llm feedback. arXiv:2311.04887.

[23]

Shailja Thakur et al. 2023. Verigen: A large language model for verilog code generation. In TRETS.

[24]

YunDa Tsai et al. 2023. Rtlfixer: Automatically fixing rtl syntax errors with large language models. arXiv:2311.16543.

[25]

Lily Jiaxin Wan et al. 2024. Software/Hardware Co-design for LLM and Its Application for Design Verification. In ASP-DAC.

[26]

Jason Wei et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems.

[27]

Guangxuan Xiao et al. 2023. Smoothquant: Accurate and efficient post-training quantization for large language models. In ICML.

[28]

Hanchen Ye et al. 2022. ScaleHLS: A new scalable high-level synthesis framework on multi-level intermediate representation. In HPCA.

[29]

Hanchen Ye et al. 2022. ScaleHLS: a scalable high-level synthesis framework with multi-level transformations and optimizations. In DAC.

[30]

Hanchen Ye et al. 2023. High-level Synthesis for Domain Specific Computing. In ISPD.

[31]

Hanchen Ye et al. 2024. HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis. In ASPLOS.

[32]

Haoran You et al. 2023. Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design. In HPCA.

[33]

Shulin Zeng et al. 2024. FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGA. arXiv:2401.03868.

[34]

Xiaofan Zhang et al. 2018. DNNBuilder: An automated tool for building highperformance DNN hardware accelerators for FPGAs. In ICCAD.

[35]

Xiaofan Zhang et al. 2022. AutoDistill: An end-to-end framework to explore and distill hardware-efficient language models. arXiv:2201.08539.

[36]

Zhenyu Zhang et al. 2023. H2o: Heavy-hitter oracle for efficient generative inference of large language models. arXiv:2306.14048.

[37]

Ruizhe Zhong et al. 2023. LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation. arXiv:2401.12224.

Cited By

Pahune SAkhtar Z(2025)Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language ModelsInformation10.3390/info1602008716:2(87)Online publication date: 23-Jan-2025
https://doi.org/10.3390/info16020087
Ding WAbdel-Basset MAli AMoustafa N(2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
https://doi.org/10.1016/j.asoc.2024.112663
Abdollahi MYeganli SBaharloo MBaniasadi A(2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024
https://doi.org/10.3390/electronics14010120
Show More Cited By

Index Terms

Invited: New Solutions on LLM Acceleration, Optimization, and Application
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

Efficient System-Level Hardware Synthesis of Dataflow Programs Using Shared Memory Based FIFO

The purpose of this paper is to raise the level of abstraction in the design of embedded systems to the system-level. A novel design flow was proposed that enables an efficient hardware implementation of video processing applications described using a ...
An Empirical Comparision of LLM-based Hardware Design and High-level Synthesis
FPGA '25: Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Field-Programmable Gate Arrays (FPGAs) are increasingly used for accelerating diverse applications due to their reconfigurability and ability to implement custom hardware architectures. However, programming FPGAs remains challenging, traditionally ...
A Survey on Performance Optimization of High-Level Synthesis Tools
Abstract
Field-programmable gate arrays (FPGAs) have recently evolved as a valuable component of the heterogeneous computing. The register transfer level (RTL) design flows demand the designers to be experienced in hardware, resulting in a possible failure ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

IBM-Illinois Discovery Accelerator Institute,
AMD Center of Excellence at UIUC
AMD Heterogeneous Adaptive Compute Cluster (HACC) initiative
NSF (National Science Foundation)
Semiconductor Research Corporation (SRC)

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
1,234
Total Downloads

Downloads (Last 12 months)1,234
Downloads (Last 6 weeks)458

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pahune SAkhtar Z(2025)Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language ModelsInformation10.3390/info1602008716:2(87)Online publication date: 23-Jan-2025
https://doi.org/10.3390/info16020087
Ding WAbdel-Basset MAli AMoustafa N(2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
https://doi.org/10.1016/j.asoc.2024.112663
Abdollahi MYeganli SBaharloo MBaniasadi A(2024)Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open IssuesElectronics10.3390/electronics1401012014:1(120)Online publication date: 30-Dec-2024
https://doi.org/10.3390/electronics14010120
Suleykin APanfilov P(2024)Smart Technical Support System Development Using Knowledge Map-Aided Approach2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)10.1109/SUMMA64428.2024.10803829(455-460)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SUMMA64428.2024.10803829
Chinnalagu A(2024)Comparative Analysis of Fine-Tuned LLM, BERT and DL Models for Customer Sentiment Analysis2024 13th International Conference on System Modeling & Advancement in Research Trends (SMART)10.1109/SMART63812.2024.10882546(255-259)Online publication date: 6-Dec-2024
https://doi.org/10.1109/SMART63812.2024.10882546

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten