On behalf of the Steering Committee, we welcome you to the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO '20) and to San Diego.
The Program Committee, co-chaired by Jingling Xue and Peng Wu, has prepared an excellent technical program involving 22 research papers plus (new this year) 3 tool and practical experience papers. We would like to express our appreciation to the Program Committee, along with all authors who submitted papers, and all external reviewers and volunteers, for their contributions.
In addition to the main symposium, we have a set of exciting workshops and tutorials thanks to the work of Johann Hauswald and Yunqi Zhang. There are 5 workshops and tutorials planned for the weekend before the symposium, and together with CC, PPoPP and HPCA, there are over 20 highly anticipated co-located events.
Organizing a conference as complex as CGO requires outstanding dedication and support of its organizers and CGO '20 is no different. Christophe Dubach served both as sponsor chair and finance chair with the hard work of balancing the budget and securing financial support from our sponsors. Michel Steuwer, Bastian Hagedorn, and Michael Laurenzano took care of the difficult and important task of chairing the artifact evaluation. Animesh Jain oversaw the administration of the student travel awards. Dongyoon Lee is the one who addressed the difficult task of handling registration. Changhee Jung organized the student research competition, which is also an important event for the conference. In this long list, we also want to mention Rajiv Gupta general chair of PPoPP who handled the hotel and local arrangements. We would like to finish our thanks with Aaron Smith as proceedings chair, Dongjie He as web chair, Fabian Grüber as publicity chair, and our general co-chairs Lingjia Tang and Jason Mars.
The Steering Committee provided advice throughout the organization of the event, as did past CGO organizers. We especially would like to thank Aaron Smith for his invaluable support. And we would like to thank the PPoPP and HPCA conferences for another opportunity to co-locate with them.
We thank all our financial sponsors for their continued generous support: ACM, IEEE, the US National Science Foundation, Alibaba Group, arm, Facebook, Futurewei, Google, Microsoft, Oracle, Uber, SIGMICRO, and SIGPLAN.
Finally, thanks to all of you as attendees and once again, welcome to CGO!
Fabrice Rastello, Inria, France
Proceeding Downloads
Efficient nursery sizing for managed languages on multi-core processors with shared caches
In modern programming languages, automatic memory management has become a standard feature for allocating and freeing memory. In this paper, we show that the performance of today’s managed languages can degrade significantly due to cache contention ...
Type freezing: exploiting attribute type monomorphism in tracing JIT compilers
Dynamic programming languages continue to increase in popularity. While just-in-time (JIT) compilation can improve the performance of dynamic programming languages, a significant performance gap remains with respect to ahead-of-time compiled languages. ...
Low-cost prediction-based fault protection strategy
Increasing failures from transient faults necessitates the cost-efficient protection mechanism that will be always activated. Thus, we propose a novel prediction-based transient fault protection strategy as a low-cost software-only technique. Instead of ...
Secure automatic bounds checking: prevention is simpler than cure
Recent Spectre attacks exploit hardware speculative execution to read forbidden data. The attacks speculatively load forbidden data in misspeculated paths creating a side channel via the microarchitectural state which is not cleaned up after a ...
Aloe: verifying reliability of approximate programs in the presence of recovery mechanisms
Modern hardware is becoming increasingly susceptible to silent data corruptions. As general methods for detection and recovery from errors are time and energy consuming, selective detection and recovery are promising alternatives for applications that ...
Interactive debugging of concurrent programs under relaxed memory models
Programming environments for sequential programs provide strong debugging support. However, concurrent programs, especially under relaxed memory models, lack powerful interactive debugging tools. In this work, we present Gambit, an interactive debugging ...
Testing static analyses for precision and soundness
Static analyses compute properties of programs that are true in all executions, and compilers use these properties to justify optimizations such as dead code elimination. Each static analysis in a compiler should be as precise as possible while ...
HALO: post-link heap-layout optimisation
Today, general-purpose memory allocators dominate the landscape of dynamic memory management. While these solutions can provide reasonably good behaviour across a wide range of workloads, it is an unfortunate reality that their behaviour for any ...
Efficient and scalable cross-ISA virtualization of hardware transactional memory
System virtualization is a key enabling technology. However, existing virtualization techniques suffer from a significant limitation due to their limited cross-ISA support for emerging architecture-specific hardware extensions. To address this issue, we ...
Speculative reconvergence for improved SIMT efficiency
- Sana Damani,
- Daniel R. Johnson,
- Mark Stephenson,
- Stephen W. Keckler,
- Eddie Yan,
- Michael McKeown,
- Olivier Giroux
GPUs perform most efficiently when all threads in a warp execute the same sequence of instructions convergently. However, when threads in a warp encounter a divergent branch, the hardware serializes the execution of diverged paths. We consider a class ...
Optimizing occupancy and ILP on the GPU using a combinatorial approach
This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU). Exploiting ILP (minimizing schedule length) requires using more ...
Multi-layer optimizations for end-to-end data analytics
We consider the problem of training machine learning models over multi-relational data. The mainstream approach is to first construct the training dataset using a feature extraction query over input database and then use a statistical software package ...
Optimizing ordered graph algorithms with GraphIt
- Yunming Zhang,
- Ajay Brahmakshatriya,
- Xinyi Chen,
- Laxman Dhulipala,
- Shoaib Kamil,
- Saman Amarasinghe,
- Julian Shun
Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific ...
A performance-optimizing compiler for cyber-physical digital microfluidic biochips
This paper introduces a compiler optimization strategy for Software-Programmable Laboratories-on-a-Chip (SP-LoCs), which miniaturize and automate a wide variety of benchtop laboratory experiments. The compiler targets a specific class of SP-LoCs that ...
CogniCryptGEN: generating code for the secure usage of crypto APIs
Many software applications are insecure because they misuse cryptographic APIs. Prior attempts to address misuses focused on detecting them after the fact. However, avoiding such misuses in the first place would significantly reduce development cost.
...
AN5D: automated stencil framework for high-degree temporal blocking on GPUs
Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure ...
The design and implementation of the wolfram language compiler
The popularity of data- and scientific-oriented applications, abundance of on-demand compute resources, and scarcity of domain expert programmers have given rise to high-level scripting languages. These high-level scripting languages offer a fast way to ...
SIMD support in .NET: abstract and concrete vector types and operations
This paper describes SIMD (Single Instruction Multiple Data) APIs for .NET that expose the parallel execution capabilities available on modern processors. These APIs include both platform-independent and platform-specific APIs that expose the SIMD ...
NeuroVectorizer: end-to-end vectorization with deep reinforcement learning
One of the key challenges arising when compilers vectorize loops for today’s SIMD-compatible architectures is to decide if vectorization or interleaving is beneficial. Then, the compiler has to determine the number of instructions to pack together and ...
Introducing the pseudorandom value generator selection in the compilation toolchain
As interest in randomization has grown within the computing community, the number of pseudorandom value generators (PRVGs) at developers' disposal dramatically increased. Today, developers lack the tools necessary to obtain optimal behavior from their ...
COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors
Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle ...
PreScaler: an efficient system-aware precision scaling framework on heterogeneous systems
Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with ...
ATMem: adaptive data placement in graph applications on heterogeneous memories
Active development in new memory devices, such as non-volatile memories and high-bandwidth memories, brings heterogeneous memory systems (HMS) as a promising solution for implementing large-scale memory systems with cost, area, and power limitations. ...
Automatic generation of high-performance quantized machine learning kernels
Quantization optimizes machine learning inference for resource constrained environments by reducing the precision of its computation. In the extreme, even single-bit computations can produce acceptable results at dramatically lower cost. But this ultra-...
Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilers
- Mohammad Mahdi Javanmard,
- Zafar Ahmad,
- Martin Kong,
- Louis-Noël Pouchet,
- Rezaul Chowdhury,
- Robert Harrison
We present a novel framework to automatically derive highly efficient parametric multi-way recursive divide&conquer algorithms for a class of dynamic programming (DP) problems. Standard two-way or any fixed R-way recursive divide&conquer algorithms may ...
Index Terms
- Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization