skip to main content
10.1145/3629527.3651404acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
abstract

LLaMPS: Large Language Models Placement System

Published: 07 May 2024 Publication History

Abstract

The rapid expansion of Large Language Models (LLMs) presents significant challenges in efficient deployment for inference tasks, primarily due to their substantial memory and computational resource requirements. Many enterprises possess a variety of computing resources-servers, VMs, PCs, laptops-that cannot individually host a complete LLM. Collectively, however, these resources may be adequate for even the most demanding LLMs. We introduce LLaMPS, a novel tool, designed to optimally distribute blocks 1 of LLMs across available computing resources within an enterprise. LLaMPS leverages the unused capacities of these machines, allowing for the decentralized hosting of LLMs. This tool enables users to contribute their machine's resources to a shared pool, facilitating others within the network to access and utilize these resources for inference tasks. At its core, LLaMPS employs a sophisticated distributed framework to allocate transformer blocks of LLMs across various servers. In cases where a model is pre-deployed, users can directly access inference results (GUI and API). Our tool has undergone extensive testing with several open-source LLMs, including BLOOM-560m, BLOOM-3b, BLOOM-7b1, Falcon 40b, and LLaMA-70b. It is currently implemented in a real-world enterprise network setting, demonstrating its practical applicability and effectiveness.

Reference

[1]
Ravi Kumar Singh, Likhit Bandamudi, Shruti Kunde, Mayank Mishra, and Rekha Singhal. 2024. Leftovers for LlaMA. In International Conference on Performance Engineering(accepted). ICPE.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance Engineering
May 2024
305 pages
ISBN:9798400704451
DOI:10.1145/3629527
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2024

Check for updates

Author Tags

  1. distributed inference
  2. llms
  3. optimal block placement

Qualifiers

  • Abstract

Conference

ICPE '24

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 65
    Total Downloads
  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)15
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media