research-article

GPUs as an opportunity for offloading garbage collection

Authors:

Jeffrey Morlan,

Krste Asanović,

Anthony D. Joseph,

John KubiatowiczAuthors Info & Claims

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

Pages 25 - 36

https://doi.org/10.1145/2258996.2259002

Published: 15 June 2012 Publication History

Abstract

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated CPU/GPU combinations, may create an opportunity to utilize these otherwise unused cycles for offloading traditional systems tasks. Garbage collection appears to be a particularly promising candidate for offloading, due to the popularity of managed languages on consumer devices.

We investigate the challenges for offloading garbage collection to a GPU, by examining the performance trade-offs for the mark phase of a mark & sweep garbage collector. We present a theoretical analysis and an algorithm that demonstrates the feasibility of this approach. We also discuss a number of algorithmic design trade-offs required to leverage the strengths and capabilities of the GPU hardware. Our algorithm has been integrated into the Jikes RVM and we present promising performance results.

References

[1]

B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp. The Jikes Research Virtual Machine project: Building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005.

Digital Library

[2]

AMD. AMD Embedded G-Series Platform: The world's firs combination of low-power CPU and advanced GPU integrated into a single embedded device. http://www.amd.com/us/Documents/49282_ G-Series_platform_brief.pdf.

[3]

AMD. AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide. http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf.

[4]

A. W. Appel and A. Bendiksen. Vectorized garbage collection. The Journal of Supercomputing, 3:151--160, 1989.

[5]

K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. SIGPLAN Not., 45:1--10, June 2010.

Digital Library

[6]

S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., 41:169--190, October 2006.

Digital Library

[7]

M. Elteir, H. Lin, and W.-C. Feng. Performance Characterizatio and Optimization of Atomic Operations on AMD GPUs. In 2011 IEEE International Conference on Cluster Computing (CLUSTER), pages 234 --243, Sept 2011.

Digital Library

[8]

E. M. Gagnon and L. J. Hendren. SableVM: A Research Framework for the Efficient Execution of Java Bytecode. In In Proceedings of the Java Virtual Machine Research and Technology Symposium, pages 27--40, 2000.

Digital Library

[9]

R. J. Garner, S. M. Blackburn, and D. Frampton. A comprehensive evaluation of object scanning techniques. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 33--42, New York, NY, USA, 2011.

Digital Library

[10]

P. Harish and P. J. Narayanan. Accelerating large grap algorithms on the GPU using CUDA. Technology, 4873:197--208, 2007.

Digital Library

[11]

M. Harris. Parallel Prefix Sum (Scan) with CUDA. GPU Gems, 3 (April):851--876, 2007.

[12]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011.

Digital Library

[13]

A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection, 04 2010. URL http://www.patentlens.net/patentlens/patent/US_2010_0082930_A1/en/.

[14]

R. Jones and R. D. Lins. Garbage Collection: Algorithms fo Automatic Dynamic Memory Management. Wiley, Sept. 1996.

Digital Library

[15]

Khronos Group. OpenCL 1.2 Specification. http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.

[16]

L. Luo, M.Wong, andW.-m. Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010.

Digital Library

[17]

S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM '08, pages 11--20, New York, NY, USA, 2008.

Digital Library

[18]

J. Naghmouchi, D. P. Scarpazza, and M. Berekovic. Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 337--348, New York, NY, USA, 2010.

Digital Library

[19]

R. Smith, N. Goyal, J. Ormont, K. Sankaralingam, and C. Estan. Evaluating GPUs for network packet signature matching. In International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009, pages 175 --184, April 2009.

[20]

W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical report, University of Utah, 2010.

[21]

R. Veldema and M. Philippsen. Iterative data-parallel mark & sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011.

Digital Library

[22]

C. yong Cher and M. Gschwind. Cell GC: using the Cel synergistic processor as a garbage collection coprocessor. In VEE '08: Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 141--150. ACM, 2008.

Digital Library

Cited By

Osama MWijs ABiere A(2024)Certified SAT solving with GPU accelerated inprocessingFormal Methods in System Design10.1007/s10703-023-00432-z62:1-3(79-118)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10703-023-00432-z
van Eerd JGroote JHijma PMartens JOsama MWijs A(2023)Innermost many-sorted term rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910225:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.scico.2022.102910
Fumero JStratikopoulos AKotselidis CAguiar AChiba SBoix E(2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3397537.3397563
Show More Cited By

Index Terms

GPUs as an opportunity for offloading garbage collection
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
  2. Parallel computing methodologies
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

GPUs as an opportunity for offloading garbage collection
ISMM '12

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
FastCollect: offloading generational garbage collection to integrated GPUs
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

June 2012

152 pages

ISBN:9781450313506

DOI:10.1145/2258996

General Chair:
Martin Vechev
ETH Zurich
,
Program Chair:
Kathryn S. McKinley
The University of Texas at Austin and Microsoft Research

ACM SIGPLAN Notices Volume 47, Issue 11
ISMM '12
November 2012
136 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2426642
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISMM '12

Sponsor:

SIGPLAN

ISMM '12: International Symposium on Memory Management

June 15 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
574
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Osama MWijs ABiere A(2024)Certified SAT solving with GPU accelerated inprocessingFormal Methods in System Design10.1007/s10703-023-00432-z62:1-3(79-118)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10703-023-00432-z
van Eerd JGroote JHijma PMartens JOsama MWijs A(2023)Innermost many-sorted term rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910225:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.scico.2022.102910
Fumero JStratikopoulos AKotselidis CAguiar AChiba SBoix E(2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3397537.3397563
Celik ANie PRossbach CGligoric M(2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360603
Maas MAsanović KKubiatowicz J(2018)A hardware accelerator for tracing garbage collectionProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00022(138-151)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00022
Abhinav Nasre R(2016)FastCollectProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968520(1-10)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968455.2968520
Gorgovan Cd'Antras ALuján M(2016)MAMBOACM Transactions on Architecture and Code Optimization10.1145/289645113:1(1-26)Online publication date: 5-Apr-2016
https://dl.acm.org/doi/10.1145/2896451
Braak GCorporaal H(2016)R-GPUACM Transactions on Architecture and Code Optimization10.1145/289050613:1(1-24)Online publication date: 7-Mar-2016
https://dl.acm.org/doi/10.1145/2890506
Liu PYu JHuang M(2016)Thread-Aware Adaptive Prefetcher on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/289050513:1(1-25)Online publication date: 28-Mar-2016
https://dl.acm.org/doi/10.1145/2890505
Martins LNobre RCardoso JDelbem AMarques E(2016)Clustering-Based Selection for the Exploration of Compiler Optimization SequencesACM Transactions on Architecture and Code Optimization10.1145/288361413:1(1-28)Online publication date: 28-Mar-2016
https://dl.acm.org/doi/10.1145/2883614
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten