skip to main content
10.1145/3620665acmconferencesBook PagePublication PagesasplosConference Proceedingsconference-collections
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
ACM2024 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 La Jolla CA USA 27 April 2024- 1 May 2024
ISBN:
979-8-4007-0385-0
Published:
27 April 2024
In-Cooperation:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN
Next Conference
Reflects downloads up to 05 Mar 2025Bibliometrics
Skip Abstract Section
Abstract

Welcome to the second volume of ASPLOS'24: the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. This document is dedicated to the 2024 summer review cycle.

We introduced several notable changes to ASPLOS this year, many of which were discussed in the previous message from program chairs in Volume 1. Here, to avoid repetition, we assume that readers have already read the latter message and will only describe differences between the current cycle and the previous one. These include: (1) developing and utilizing an automated format violation identifier script focused on uncovering disallowed vertical space manipulations that "squeeze" space; (2) incorporating authors-declared best-matching topics into our review assignment process; (3) introducing the new ASPLOS role of Program Vice Chairs to cope with the increased number of submissions and the added load caused by foregoing synchronous program committee (PC) meetings, which necessitated additional managerial involvement in online dissensions; and (4) characterizing a systematic problem that ASPLOS is facing in reviewing quantum computing submissions, describing how we addressed it, and highlighting how we believe that it should be handled in the future.

Key statistics of the ASPLOS'24 summer cycle include: 409 submissions were finalized (about 1.5x more than last year's summer count and nearly 2.4x more than our spring cycle), with 107 related to accelerators/FPGAs/GPUs, 97 to machine learning, 88 to storage/memory, 80 to security, and 69 to datacenter/cloud; 179 (44%) submissions were promoted to the second review round; 54 (13.2%) papers were accepted (with 20 awarded one or more artifact evaluation badges); 33 (8.1%) submissions were allowed to submit major revisions, of which 27 were subsequently accepted during the fall cycle (with 13 awarded one or more artifact evaluation badges); 1,499 reviews were uploaded; and 5,557 comments were generated during online discussions.

Analyzing the per-submission most-related broader areas of research, which we asked authors to associate with their work in the submission form, revealed that 71%, 47%, and 28% of the submissions are categorized by their authors as related to architecture, operating systems, and programming languages, respectively, with about 45% being "interdisciplinary" submissions (associated with more than one area). The full details are available in the PDF of the front matter.

research-article
Open Access
A Fault-Tolerant Million Qubit-Scale Distributed Quantum Computer

A million qubit-scale quantum computer is essential to realize the quantum supremacy. Modern large-scale quantum computers integrate multiple quantum computers located in dilution refrigerators (DR) to overcome each DR's unscaling cooling budget. However,...

research-article
Open Access
A Journey of a 1,000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUs

We are in age of AI, with rapidly changing algorithms and a somewhat synergistic change in hardware. MLPerf is a recent benchmark suite that serves as a way to compare and evaluate hardware. However it has several drawbacks - it is dominated by CNNs and ...

A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors

As semiconductor power density is no longer constant with the technology process scaling down, we need different solutions if we are to continue scaling application performance. To this end, modern CPUs are integrating capable data accelerators on the ...

research-article
Achieving Near-Zero Read Retry for 3D NAND Flash Memory

As the flash-based storage devices age with program/erase (P/E) cycles, they require an increasing number of read retries for error correction, which in turn deteriorates their read performance. The design of read-retry methods is critical to flash read ...

research-article
An Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload Collisions

Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA ...

research-article
Atalanta: A Bit is Worth a “Thousand” Tensor Values

Atalanta is a lossless, hardware/software co-designed compression technique for the tensors of fixed-point quantized deep neural networks. Atalanta increases effective memory capacity, reduces off-die traffic, and/or helps to achieve the desired ...

research-article
AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference

The Transformer-based generative model (TbGM), comprising summarization (Sum) and generation (Gen) stages, has demonstrated unprecedented generative performance across a wide range of applications. However, it also demands immense amounts of compute and ...

Avoiding Instruction-Centric Microarchitectural Timing Channels Via Binary-Code Transformations

With the end of Moore's Law-based scaling, novel microarchitectural optimizations are being patented, researched, and implemented at an increasing rate. Previous research has examined recently published patents and papers and demonstrated ways these ...

research-article
Open Access
BitPacker: Enabling High Arithmetic Efficiency in Fully Homomorphic Encryption Accelerators

Fully Homomorphic Encryption (FHE) enables computing directly on encrypted data. Though FHE is slow on a CPU, recent hardware accelerators compensate most of FHE's overheads, enabling real-time performance in complex programs like deep neural networks. ...

research-article
BVAP: Energy and Memory Efficient Automata Processing for Regular Expressions with Bounded Repetitions

Regular pattern matching is pervasive in applications such as text processing, malware detection, network security, and bioinformatics. Recent studies have demonstrated specialized in-memory automata processors with superior energy and memory ...

Carat: Unlocking Value-Level Parallelism for Multiplier-Free GEMMs

In recent years, hardware architectures optimized for general matrix multiplication (GEMM) have been well studied to deliver better performance and efficiency for deep neural networks. With trends towards batched, low-precision data, e.g., FP8 format in ...

research-article
Open Access
CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar ...

research-article
CMC: Video Transformer Acceleration via CODEC Assisted Matrix Condensing

Video Transformers (VidTs) have reached the forefront of accuracy in various video understanding tasks. Despite their remarkable achievements, the processing requirements for a large number of video frames still present a significant performance ...

research-article
Open Access
Codesign of quantum error-correcting codes and modular chiplets in the presence of defects

Fabrication errors pose a significant challenge in scaling up solid-state quantum devices to the sizes required for fault-tolerant (FT) quantum applications. To mitigate the resource overhead caused by fabrication errors, we combine two approaches: (1) ...

Compiling Loop-Based Nested Parallelism for Irregular Workloads

Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however:...

research-article
Open Access
Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety

Violations of temporal memory safety ("use after free", "UAF") continue to pose a significant threat to software security. The CHERI capability architecture has shown promise as a technology for C and C++ language reference integrity and spatial memory ...

Design of Novel Analog Compute Paradigms with Ark

Previous efforts on reconfigurable analog circuits mostly focused on specialized analog circuits, produced through careful co-design, or on highly reconfigurable, but relatively resource inefficient, accelerators that implement analog compute paradigms. ...

Direct Memory Translation for Virtualized Clouds

Virtual memory translation has become a key performance bottleneck of memory-intensive workloads in virtualized cloud environments. On the x86 architecture, a nested translation needs to sequentially fetch up to 24 page table entries (PTEs). This paper ...

research-article
Open Access
Efficient Microsecond-scale Blind Scheduling with Tiny Quanta

A longstanding performance challenge in datacenter-based applications is how to efficiently handle incoming client requests that spawn many very short (μs scale) jobs that must be handled with high throughput and low tail latency. When no assumptions are ...

research-article
Eliminating Storage Management Overhead of Deduplication over SSD Arrays Through a Hardware/Software Co-Design

This paper presents a hardware/software co-design solution to efficiently implement block-layer deduplication over SSD arrays. By introducing complex and varying dependency over the entire storage space, deduplication is infamously subject to high ...

research-article
Open Access
Elivagar: Efficient Quantum Circuit Search for Classification

Designing performant and noise-robust circuits for Quantum Machine Learning (QML) is challenging --- the design space scales exponentially with circuit size, and there are few well-supported guiding principles for QML circuit design. Although recent ...

research-article
Open Access
Energy Efficient Convolutions with Temporal Arithmetic

Convolution is an important operation at the heart of many applications, including image processing, object detection, and neural networks. While data movement and coordination operations continue to be important areas for optimization in general-purpose ...

research-article
Open Access
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference

This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference. ExeGPT finds and runs with an optimal execution schedule to maximize inference throughput while satisfying a given latency constraint. By leveraging the ...

FaaSGraph: Enabling Scalable, Efficient, and Cost-Effective Graph Processing with Serverless Computing

Graph processing is widely used in cloud services; however, current frameworks face challenges in efficiency and cost-effectiveness when deployed under the Infrastructure-as-a-Service model due to its limited elasticity. In this paper, we present ...

FOCAL: A First-Order Carbon Model to Assess Processor Sustainability

Sustainability in general and global warming in particular are grand societal challenges. Computer systems demand substantial materials and energy resources throughout their entire lifetime. A key question is how computer engineers and scientists can ...

FPGA Technology Mapping Using Sketch-Guided Program Synthesis

FPGA technology mapping is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, ...

GIANTSAN: Efficient Memory Sanitization with Segment Folding

Memory safety sanitizers, the sharp weapon for detecting invalid memory operations during execution, employ runtime metadata to model the memory and help find memory errors hidden in the programs. However, location-based methods, the most widely deployed ...

research-article
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational ...

research-article
Open Access
Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph Processing

As graphs exponentially grow recently, out-of-core graph systems have been invented to process large-scale graphs by keeping massive data in storage. Among them, many systems process the graphs iteration-by-iteration and provide synchronous semantics ...

Greybox Fuzzing for Concurrency Testing

Uncovering bugs in concurrent programs is a challenging problem owing to the exponentially large search space of thread interleavings. Past approaches towards concurrency testing are either optimistic --- relying on random sampling of these interleavings ...

Contributors
  • Technion - Israel Institute of Technology
  • Microsoft Research
  • University of California, Riverside
  • University of California, Riverside

Recommendations

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%
YearSubmittedAcceptedRate
ASPLOS '193517421%
ASPLOS '183195618%
ASPLOS '173205317%
ASPLOS '162325323%
ASPLOS '152874817%
ASPLOS '142174923%
ASPLOS XV1813218%
ASPLOS XIII1273124%
ASPLOS XII1583824%
ASPLOS X1752414%
ASPLOS IX1142421%
ASPLOS VIII1232823%
ASPLOS VII1092523%
Overall2,71353520%