No abstract available.
Proceeding Downloads
GuaranTEE: Towards Attestable and Private ML with CCA
Machine-learning (ML) models are increasingly being deployed on edge devices to provide a variety of services. However, their deployment is accompanied by challenges in model privacy and auditability. Model providers want to ensure that (i) their ...
IA2: Leveraging Instance-Aware Index Advisor with Reinforcement Learning for Diverse Workloads
This study introduces the Instance-Aware Index Advisor (IA2), a novel deep reinforcement learning (DRL)-based approach for optimizing index selection in databases facing large action spaces of potential candidates. IA2 introduces the Twin Delayed Deep ...
Temporal Graph Generative Models: An empirical study
Graph Neural Networks (GNNs) have recently emerged as popular methods for learning representations of non-euclidean data often encountered in diverse areas ranging from chemistry to source code generation. Recently, researchers have focused on learning ...
Deploying Stateful Network Functions Efficiently using Large Language Models
Stateful network functions are increasingly used in data centers. However, their scalability remains a significant challenge since parallelizing packet processing across multiple cores requires careful configuration t o avoid compromising the application'...
The Importance of Workload Choice in Evaluating LLM Inference Systems
The success of Large Language Models (LLMs) across a wide range of applications and use cases has created the need for faster and more scalable systems for LLM inference. These systems speed up LLM inference by optimizing scheduling decisions or ...
Characterizing Training Performance and Energy for Foundation Models and Image Classifiers on Multi-Instance GPUs
- Connor Espenshade,
- Rachel Peng,
- Eumin Hong,
- Max Calman,
- Yue Zhu,
- Pritish Parida,
- Eun Kyung Lee,
- Martha A. Kim
GPUs are becoming a scarce resource in high demand, as many teams build and train increasingly advanced artificial intelligence workloads. As GPUs become more performant, they consume more energy, with NVIDIA's latest A100 and H100 graphics cards ...
ALS Algorithm for Robust and Communication-Efficient Federated Learning
- Neil Hurley,
- Erika Duriakova,
- James Geraci,
- Diarmuid O'Reilly-Morgan,
- Elias Tragos,
- Barry Smyth,
- Aonghus Lawlor
Federated learning is a distributed approach to machine learning in which a centralised server coordinates the learning task while training data is distributed among a potentially large set of clients. The focus of this paper is on top-N recommendations ...
SpeedyLoader: Efficient Pipelining of Data Preprocessing and Machine Learning Training
Data preprocessing consisting of tasks like sample resizing, cropping, and filtering, is a crucial step in machine learning (ML) workflows. Even though the preprocessing step is largely ignored by work that focuses on optimizing training algorithms, in ...
Towards Low-Energy Adaptive Personalization for Resource-Constrained Devices
The personalization of machine learning (ML) models to address data drift is a significant challenge in the context of Internet of Things (IoT) applications. Presently, most approaches focus on fine-tuning either the full base model or its last few ...
An Analysis of Collocation on GPUs for Deep Learning Training
Deep learning training is an expensive process that extensively uses GPUs. However, not all model training saturates modern powerful GPUs. To create guidelines for such cases, this paper examines the performance of the different collocation methods ...
Priority Sampling of Large Language Models for Compilers
Large Language Models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for ...
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving
Despite that prior work of batched inference and parameter-efficient fine-tuning techniques have reduced the resource requirements of large language models (LLMs), challenges remain in resource-constrained environments such as on-premise infrastructures ...
ML Training with Cloud GPU Shortages: Is Cross-Region the Answer?
The widespread adoption of ML has led to a high demand for GPU hardware and consequently, severe shortages of GPUs in the public cloud. Allocating a sufficient number of GPUs to train or fine-tune today's large ML models in a single cloud region is often ...
ALTO: An Efficient Network Orchestrator for Compound AI Systems
- Keshav Santhanam,
- Deepti Raghavan,
- Muhammad Shahir Rahman,
- Thejas Venkatesh,
- Neha Kunjal,
- Pratiksha Thaker,
- Philip Levis,
- Matei Zaharia
We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO leverages an optimization opportunity specific to generative language models, which is streaming intermediate outputs from the ...
FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission
Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into ...
De-DSI: Decentralised Differentiable Search Index
This study introduces De-DSI, a novel framework that fuses large language models (LLMs) with genuine decentralization for information retrieval, particularly employing the differentiable search index (DSI) concept in a decentralized setting. Focused on ...
Towards Pareto Optimal Throughput in Small Language Model Serving
- Pol G. Recasens,
- Yue Zhu,
- Chen Wang,
- Eun Kyung Lee,
- Olivier Tardieu,
- Alaa Youssef,
- Jordi Torres,
- Josep Ll. Berral
Large language models (LLMs) have revolutionized the state-of-the-art of many different natural language processing tasks. Although serving LLMs is computationally and memory demanding, the rise of Small Language Models (SLMs) offers new opportunities ...
Do Predictors for Resource Overcommitment Even Predict?
Resource overcommitment allows datacenters to improve resource efficiency. In this approach, the system allocates to the users the amount of resources to be most likely used, not necessarily the ones requested. To do so, the system monitors resource ...
A Hybrid Decentralised Learning Topology for Recommendations with Improved Privacy
- Diarmuid O'Reilly Morgan,
- Elias Tragos,
- James Geraci,
- Qinqin Wang,
- Neil Hurley,
- Barry Smyth,
- Aonghus Lawlor
Many recent studies have investigated the extent to which decentralised topologies for machine learning can preserve privacy, showing that in various scenarios the exchanged model updates can leak user information. In this work, we analyse the privacy ...
Evaluating Deep Learning Recommendation Model Training Scalability with the Dynamic Opera Network
Deep learning is commonly used to make personalized recommendations to users for a wide variety of activities. However, deep learning recommendation model (DLRM) training is increasingly dominated by all-to-all and many-to-many communication patterns. ...
Comparative Profiling: Insights Into Latent Diffusion Model Training
Generative AI models are at the forefront of advancing creative and analytical tasks, pushing the boundaries of what machines can generate and comprehend. Among these, latent diffusion models represent significant advancements in generating high-fidelity ...
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling
Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service ...
Navigating Challenges and Technical Debt in Large Language Models Deployment
Large Language Models (LLMs) have become an essential tool in advancing artificial intelligence and machine learning, enabling outstanding capabilities in natural language processing, and understanding. However, the efficient deployment of LLMs in ...
The Environmental Cost of Engineering Machine Learning-Enabled Systems: A Mapping Study
The integration of Machine Learning (ML) across public and industrial sectors has become widespread, posing unique challenges in comparison to conventional software development methods throughout the lifecycle of ML-Enabled Systems. Particularly, with ...
Enhancing Named Entity Recognition for Agricultural Commodity Monitoring with Large Language Models
Agriculture, as one of humanity's most essential industries, faces the challenge of adapting to an increasingly data-driven world. Strategic decisions in this sector hinge on access to precise and actionable data.
Governments, major agriculture companies,...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
EuroMLSys '21 | 26 | 18 | 69% |
Overall | 26 | 18 | 69% |