No abstract available.
Proceeding Downloads
Automatic GPU memory management for large neural models in TensorFlow
Deep learning models are becoming larger and will not fit in the limited memory of accelerators such as GPUs for training. Though many methods have been proposed to solve this problem, they are rather ad-hoc in nature and difficult to extend and ...
Massively parallel GPU memory compaction
Memory fragmentation is a widely studied problem of dynamic memory allocators. It is well known that fragmentation can lead to premature out-of-memory errors and poor cache performance.
With the recent emergence of dynamic memory allocators for SIMD ...
Scaling up parallel GC work-stealing in many-core environments
Parallel copying garbage collection (GC) is widely used in the de facto Java virtual machines such as OpenJDK and OpenJ9. OpenJDK uses work-stealing for copying objects in the Parallel GC and Garbage-First (G1) GC policies to balance the copying task ...
Exploration of memory hybridization for RDD caching in Spark
Apache Spark is a popular cluster computing framework for iterative analytics workloads due to its use of Resilient Distributed Datasets (RDDs) to cache data for in-memory processing. We have revealed that the performance of Spark RDD cache can be ...
Learning when to garbage collect with random forests
Generational garbage collectors are one of the most common types of automatic memory management. We can minimize the costs they incur by carefully choosing the points in a program's execution at which they run. However, this decision is generally based ...
Timescale functions for parallel memory allocation
Memory allocation is increasingly important to parallel performance, yet it is challenging because a program has data of many sizes, and the demand differs from thread to thread. Modern allocators use highly tuned heuristics but do not provide uniformly ...
A lock-free coalescing-capable mechanism for memory management
One common characteristic among current lock-free memory allocators is that they rely on the operating system to manage memory since they lack a lower-level mechanism capable of splitting and coalescing blocks of memory. In this paper, we discuss this ...
Concurrent marking of shape-changing objects
Efficient garbage collection is a key goal in engineering high-performance runtime systems. To reduce pause times, many collector designs traverse the object graph concurrently with the application, an optimization known as concurrent marking. ...
Design and analysis of field-logging write barriers
Write barriers are a fundamental mechanism that most production garbage collection algorithms depend on. They inform the collector of mutations to the object graph, enabling partial heap collections, concurrent collection, and reference counting. While ...
Gradual write-barrier insertion into a Ruby interpreter
Ruby is a popular object-oriented programming language, and the performance of the Ruby garbage collector (GC) directly affects the execution time of Ruby programs. Ruby 2.0 and earlier versions employed an inefficient non-generational conservative mark-...
snmalloc: a message passing allocator
- Paul Liétar,
- Theodore Butler,
- Sylvan Clebsch,
- Sophia Drossopoulou,
- Juliana Franco,
- Matthew J. Parkinson,
- Alex Shamis,
- Christoph M. Wintersteiger,
- David Chisnall
snmalloc is an implementation of malloc aimed at workloads in which objects are typically deallocated by a different thread than the one that had allocated them. We use the term producer/consumer for such workloads. snmalloc uses a novel message passing ...
Index Terms
- Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management