skip to main content
10.1145/3620666acmconferencesBook PagePublication PagesasplosConference Proceedingsconference-collections
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
ACM2024 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 La Jolla CA USA 27 April 2024- 1 May 2024
ISBN:
979-8-4007-0386-7
Published:
27 April 2024
In-Cooperation:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN
Next Conference
Reflects downloads up to 18 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

Welcome to the third volume of ASPLOS'24: the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. This document is mostly dedicated to the 2024 fall cycle but also provides some statistics summarizing all three cycles.

We introduced several notable changes to ASPLOS this year, most of which were discussed in our previous messages from program chairs in Volume 1 and 2, including: (1) significantly increasing the program committee size to over 220 members (more than twice the size of last year); (2) foregoing synchronous program committee (PC) meetings and instead making all decisions online; (3) overhauling the review assignment process; (4) developing an automated submission format violation identifier script that uncovers, e.g., disallowed vertical space manipulations that "squeeze" space; (5) introducing the new ASPLOS role of Program Vice Chairs to cope with the increased number of submissions and the added load caused by foregoing synchronous program committee; and (6) characterizing a systematic problem that ASPLOS is facing in reviewing quantum computing submissions, describing how we addressed it and highlighting how we believe that it should be handled in the future.

Assuming readers have read our previous messages, here, we will only describe differences between the current cycle and the previous ones. These include: (1) Finally unifying submission and acceptance paper formatting instructions (forgoing the `jpaper' class) to rid authors of accepted papers from the need to reformat; (2) Describing the methodology we employed to select best papers, which we believe ensures quality and hope will persist; and (3) Reporting the ethical incidents we encountered and how we handled them. In the final, fourth volume, when the outcome of the ASPLOS'24 fall major revisions will become known, we plan to conduct a broader analysis of all the data we have gathered throughout the year.

Following are some key statistics of the fall cycle: 340 submissions were finalized (43% more than last year's fall count and 17% less than our summer cycle) of which 111 are related to accelerators/FPGAs/GPUs, 105 to machine learning, 54 to security, 50 to datacenter/cloud and 50 to storage/memory; 183 (54%) submissions were promoted to the second review round; 39 (11.5%) papers were accepted (of which 19 were awarded artifact evaluation badges); 33 (9.7%) submissions were allowed to submit major revisions and are currently under review (these will be addressed in the fourth volume of ASPLOS'24 and will be presented in ASPLOS'25 if accepted); 1,368 reviews were uploaded; and 4,949 comments were generated during online discussions, of which 4,070 were dedicated to the submissions that made it to the second review round.

This year, in the submission form, we asked authors to specify which of the three ASPLOS research areas are related to their submitted work. Analyzing this data revealed that 80%, 39%, and 29% of the submissions are categorized by their authors as related to architecture, operating systems, and programming languages, respectively, generating the highest difference we have observed across the cycles between architecture and the other two. About 46% of the fall submissions are "interdisciplinary," namely, were associated with two or more of the three areas.

Overall, throughout all the ASPLOS'24 cycles, we received 922 submissions, constituting a 1.54x increase compared to last year. Our reviewers submitted a total of 3,634 reviews containing more than 2.6 million words, and we also generated 12,655 online comments consisting of nearly 1.2 million words. As planned, PC members submitted an average of 15.7 reviews and a median of 15, and external review committee (ERC) members submitted an average of 4.7 and a median of 5.

We accepted 170 papers thus far, written by 1100 authors, leading to an 18.4% acceptance rate, with the aforementioned 33 major revisions still under review. Assuming that the revision acceptance rate will be similar to that of previous cycles, we estimate that ASPLOS'24 will accept nearly 200 (!) papers, namely, 21%–22% of the submissions.

The ASPLOS'24 program consists of 193 papers: the 170 papers we accepted thus far and, in addition, 23 major revisions from the fall cycle of ASPLOS'23, which were re-reviewed and accepted. The full details are available in the PDF of the front matter.

invited-talk
Societal infrastructure in the age of Artificial General Intelligence

Today, we are at an inflection point in computing where emerging Generative AI services are placing unprecedented demand for compute while the existing architectural patterns for improving efficiency have stalled. In this talk, we will discuss the likely ...

invited-talk
Challenges and Opportunities for Systems Using CXL Memory

We are at the start of the technology cycle for compute express link (CXL) memory, which is a significant opportunity and challenge for architecture, operating systems, and programming languages. The 3.0 CXL specification allows multiple, physically ...

invited-talk
Harnessing the Power of Specialization for Sustainable Computing

Computing is critical to address some of the most pressing needs of humanity today, including climate change mitigation and adaptation. However, it is also the source of a significant and steadily increasing carbon toll, attributed in part to the ...

invited-talk
AWS Trainium: The Journey for Designing and Optimization Full Stack ML Hardware

Machine learning accelerators present a unique set of design challenges across chip architecture, instruction set, server design, compiler, and both inter- and intra-chip connectivity. With AWS Trainium, we've utilized AWS's end-to-end ownership from ...

8-bit Transformer Inference and Fine-tuning for Edge Accelerators

Transformer models achieve state-of-the-art accuracy on natural language processing (NLP) and vision tasks, but demand significant computation and memory resources, which makes it difficult to perform inference and training (fine-tuning) on edge ...

A Midsummer Night’s Tree: Efficient and High Performance Secure SCM

Secure memory is a highly desirable property to prevent memory corruption-based attacks. The emergence of nonvolatile, storage class memory (SCM) devices presents new challenges for secure memory. Metadata for integrity verification, organized in a ...

research-article
Open Access
A shared compilation stack for distributed-memory parallelism in stencil DSLs

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target ...

research-article
Accelerating Multi-Scalar Multiplication for Efficient Zero Knowledge Proofs with Multi-GPU Systems

Zero-knowledge proof is a cryptographic primitive that allows for the validation of statements without disclosing any sensitive information, foundational in applications like verifiable outsourcing and digital currency. However, the extensive proof ...

research-article
Open Access
ACES: Accelerating Sparse Matrix Multiplication with Adaptive Execution Flow and Concurrency-Aware Cache Optimizations

Sparse matrix-matrix multiplication (SpMM) is a critical computational kernel in numerous scientific and machine learning applications. SpMM involves massive irregular memory accesses and poses great challenges to conventional cache-based computer ...

AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

Large language models (LLMs) have demonstrated powerful capabilities, requiring huge memory with their increasing sizes and sequence lengths, thus demanding larger parallel systems. The broadly adopted pipeline parallelism introduces even heavier and ...

research-article
Open Access
AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs

This work investigates a new erase scheme in NAND flash memory to improve the lifetime and performance of modern solid-state drives (SSDs). In NAND flash memory, an erase operation applies a high voltage (e.g., > 20 V) to flash cells for a long time (...

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

In an effort to increase the utilization of data center resources cloud providers have introduced a new type of virtual machine (VM) offering, called a burstable VM (BVM). Our work is the first to study the characteristics of burstable VMs (based on ...

BeeZip: Towards An Organized and Scalable Architecture for Data Compression

Data compression plays a critical role in operating systems and large-scale computing workloads. Its primary objective is to reduce network bandwidth consumption and memory/storage capacity utilization. Given the need to manipulate hash tables, and ...

research-article
Boost Linear Algebra Computation Performance via Efficient VNNI Utilization

Intel's Vector Neural Network Instruction (VNNI) provides higher efficiency on calculating dense linear algebra (DLA) computations than conventional SIMD instructions. However, existing auto-vectorizers frequently deliver suboptimal utilization of VNNI ...

research-article
Open Access
C4CAM: A Compiler for CAM-based In-memory Accelerators

Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von ...

research-article
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning

Efficiently training large language models (LLMs) necessitates the adoption of hybrid parallel methods, integrating multiple communications collectives within distributed partitioned graphs. Overcoming communication bottlenecks is crucial and is often ...

research-article
Open Access
Characterizing a Memory Allocator at Warehouse Scale

Memory allocation constitutes a substantial component of warehouse-scale computation. Optimizing the memory allocator not only reduces the datacenter tax, but also improves application performance, leading to significant cost savings.

We present the ...

research-article
Open Access
Characterizing Power Management Opportunities for LLMs in the Cloud

Recent innovation in large language models (LLMs), and their myriad use cases have rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and other enterprises plan to substantially grow their datacenter capacity to support ...

CSSTs: A Dynamic Data Structure for Partial Orders in Concurrent Execution Analysis

Dynamic analyses are a standard approach to analyzing and testing concurrent programs. Such techniques observe program traces σ and analyze them to infer the presence or absence of bugs. At its core, each analysis maintains a partial order P that ...

research-article
Open Access
Dr. DNA: Combating Silent Data Corruptions in Deep Learning using Distribution of Neuron Activations

Deep neural networks (DNNs) have been widely-adopted in various safety-critical applications such as computer vision and autonomous driving. However, as technology scales and applications diversify, coupled with the increasing heterogeneity of underlying ...

research-article
DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores

Sparse Matrix-Matrix Multiplication (SpMM) is a building-block operation in scientific computing and machine learning applications. Recent advancements in hardware, notably Tensor Cores (TCs), have created promising opportunities for accelerating SpMM. ...

research-article
Open Access
Energy-Adaptive Buffering for Efficient, Responsive, and Persistent Batteryless Systems

Batteryless energy harvesting systems enable a wide array of new sensing, computation, and communication platforms untethered by power delivery or battery maintenance demands. Energy harvesters charge a buffer capacitor from an unreliable environmental ...

Enforcing C/C++ Type and Scope at Runtime for Control-Flow and Data-Flow Integrity

Control-flow hijacking and data-oriented attacks are becoming more sophisticated. These attacks, especially data-oriented attacks, can result in critical security threats, such as leaking an SSL key. Data-oriented attacks are hard to defend against with ...

EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree

As deep learning models become increasingly complex, the deep learning compilers are critical for enhancing the system efficiency and unlocking hidden optimization opportunities. Although excellent speedups have been achieved in inference workloads, ...

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are ...

FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool Architecture

In serverless computing, an idle container is not recycled directly, in order to mitigate time-consuming cold container startup. These idle containers still occupy the memory, exasperating the memory shortage of today's data centers. By offloading their ...

research-article
Open Access
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine learning. However, due to the unstructured sparsity of SpTA, the general-purpose processors (e.g., GPU and CPU) are inefficient because of the underutilized hardware ...

research-article
Open Access
Felix: Optimizing Tensor Programs with Gradient Descent

Obtaining high-performance implementations of tensor programs such as deep neural networks on a wide range of hardware remains a challenging task. Search-based tensor program optimizers can automatically find high-performance programs on a given hardware ...

Fermihedral: On the Optimal Compilation for Fermion-to-Qubit Encoding

This paper introduces Fermihedral, a compiler framework focusing on discovering the optimal Fermion-to-qubit encoding for targeted Fermionic Hamiltonians. Fermion-to-qubit encoding is a crucial step in harnessing quantum computing for efficient ...

Flexible Non-intrusive Dynamic Instrumentation for WebAssembly

A key strength of managed runtimes over hardware is the ability to gain detailed insight into the dynamic execution of programs with instrumentation. Analyses such as code coverage, execution frequency, tracing, and debugging, are all made easier in a ...

Contributors
  • Technion - Israel Institute of Technology
  • Microsoft Research
  • University of California, Riverside
  • University of California, Riverside

Recommendations

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%
YearSubmittedAcceptedRate
ASPLOS '193517421%
ASPLOS '183195618%
ASPLOS '173205317%
ASPLOS '162325323%
ASPLOS '152874817%
ASPLOS '142174923%
ASPLOS XV1813218%
ASPLOS XIII1273124%
ASPLOS XII1583824%
ASPLOS X1752414%
ASPLOS IX1142421%
ASPLOS VIII1232823%
ASPLOS VII1092523%
Overall2,71353520%