poster

Training ChatGPT-like Models with In-network Computation

Authors:

Shuhao Fu,

Yong Liao,

Pengyuan ZhouAuthors Info & Claims

APNet '23: Proceedings of the 7th Asia-Pacific Workshop on Networking

Pages 206 - 207

https://doi.org/10.1145/3600061.3603136

Published: 05 September 2023 Publication History

Get Access

Abstract

ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.

References

[1]

Jiarui Fang, Zilin Zhu, Shenggui Li, Hui Su, Yang Yu, Jie Zhou, and Yang You. 2023. Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management. IEEE Transactions on Parallel and Distributed Systems 34, 1 (jan 2023), 304–315. https://doi.org/10.1109/tpds.2022.3219819

Crossref

Google Scholar

[2]

fka. 2023. fka/awesome-chatgpt-prompts. Retrieved March 7, 2023 from https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/blob/main/prompts.csv

Google Scholar

[3]

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2019. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. https://doi.org/10.48550/ARXIV.1910.02054

Crossref

Google Scholar

[4]

Mateus Saquetti, Ronaldo Canofre, Arthur F Lorenzon, Fábio D Rossi, José Rodrigo Azambuja, Weverton Cordeiro, and Marcelo C Luizelli. 2021. Toward in-network intelligence: Running distributed artificial neural networks in the data plane. IEEE Communications Letters 25, 11 (2021), 3551–3555.

Crossref

Google Scholar

[5]

Giuseppe Siracusano and Roberto Bifulco. 2018. In-network neural networks. arXiv preprint arXiv:1801.05731 (2018).

Google Scholar

[6]

Zhaoqi Xiong and Noa Zilberman. 2019. Do switches dream of machine learning? toward in-network classification. In Proceedings of the 18th ACM workshop on hot topics in networks. 25–33.

Digital Library

Google Scholar

Index Terms

Training ChatGPT-like Models with In-network Computation
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
2. Networks
  1. Network services
    1. In-network processing

Recommendations

Maximum achievable throughput in a wireless sensor network using in-network computation for statistical functions

Many applications require the sink to compute a function of the data collected by the sensors. Instead of sending all the data to the sink, the intermediate nodes could process the data they receive to significantly reduce the volume of traffic ...
Targeted training for numerical reasoning with large language models: Targeted training for numerical reasoning with large language...
Abstract
After recent gains achieved by large language models (LLMs) on numerical reasoning tasks, it has become of interest to have LLMs teach small models to improve on numerical reasoning. Instructing LLMs to generate Chains of Thought to fine-tune ...
Distributed Training of Large Language Models on AWS Trainium
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing

Large language models (LLMs) are ubiquitously powerful but prohibitively expensive to train, often requiring thousands of compute devices, typically GPUs. To reduce the cost of training LLMs for customers, Amazon Web Services (AWS) launched the Amazon ...

Comments

Information & Contributors

Information

Published In

APNet '23: Proceedings of the 7th Asia-Pacific Workshop on Networking

June 2023

229 pages

ISBN:9798400707827

DOI:10.1145/3600061

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2023

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

APNET 2023

APNET 2023: 7th Asia-Pacific Workshop on Networking

June 29 - 30, 2023

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
117
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)6

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Maximum achievable throughput in a wireless sensor network using in-network computation for statistical functions

Targeted training for numerical reasoning with large language models: Targeted training for numerical reasoning with large language...

Distributed Training of Large Language Models on AWS Trainium

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations