skip to main content
10.1145/2258996.2259002acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

GPUs as an opportunity for offloading garbage collection

Published: 15 June 2012 Publication History

Abstract

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated CPU/GPU combinations, may create an opportunity to utilize these otherwise unused cycles for offloading traditional systems tasks. Garbage collection appears to be a particularly promising candidate for offloading, due to the popularity of managed languages on consumer devices.
We investigate the challenges for offloading garbage collection to a GPU, by examining the performance trade-offs for the mark phase of a mark & sweep garbage collector. We present a theoretical analysis and an algorithm that demonstrates the feasibility of this approach. We also discuss a number of algorithmic design trade-offs required to leverage the strengths and capabilities of the GPU hardware. Our algorithm has been integrated into the Jikes RVM and we present promising performance results.

References

[1]
B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp. The Jikes Research Virtual Machine project: Building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005.
[2]
AMD. AMD Embedded G-Series Platform: The world's firs combination of low-power CPU and advanced GPU integrated into a single embedded device. http://www.amd.com/us/Documents/49282_ G-Series_platform_brief.pdf.
[3]
AMD. AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide. http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf.
[4]
A. W. Appel and A. Bendiksen. Vectorized garbage collection. The Journal of Supercomputing, 3:151--160, 1989.
[5]
K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. SIGPLAN Not., 45:1--10, June 2010.
[6]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., 41:169--190, October 2006.
[7]
M. Elteir, H. Lin, and W.-C. Feng. Performance Characterizatio and Optimization of Atomic Operations on AMD GPUs. In 2011 IEEE International Conference on Cluster Computing (CLUSTER), pages 234 --243, Sept 2011.
[8]
E. M. Gagnon and L. J. Hendren. SableVM: A Research Framework for the Efficient Execution of Java Bytecode. In In Proceedings of the Java Virtual Machine Research and Technology Symposium, pages 27--40, 2000.
[9]
R. J. Garner, S. M. Blackburn, and D. Frampton. A comprehensive evaluation of object scanning techniques. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 33--42, New York, NY, USA, 2011.
[10]
P. Harish and P. J. Narayanan. Accelerating large grap algorithms on the GPU using CUDA. Technology, 4873:197--208, 2007.
[11]
M. Harris. Parallel Prefix Sum (Scan) with CUDA. GPU Gems, 3 (April):851--876, 2007.
[12]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011.
[13]
A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection, 04 2010. URL http://www.patentlens.net/patentlens/patent/US_2010_0082930_A1/en/.
[14]
R. Jones and R. D. Lins. Garbage Collection: Algorithms fo Automatic Dynamic Memory Management. Wiley, Sept. 1996.
[15]
Khronos Group. OpenCL 1.2 Specification. http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.
[16]
L. Luo, M.Wong, andW.-m. Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010.
[17]
S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM '08, pages 11--20, New York, NY, USA, 2008.
[18]
J. Naghmouchi, D. P. Scarpazza, and M. Berekovic. Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 337--348, New York, NY, USA, 2010.
[19]
R. Smith, N. Goyal, J. Ormont, K. Sankaralingam, and C. Estan. Evaluating GPUs for network packet signature matching. In International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009, pages 175 --184, April 2009.
[20]
W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical report, University of Utah, 2010.
[21]
R. Veldema and M. Philippsen. Iterative data-parallel mark & sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011.
[22]
C. yong Cher and M. Gschwind. Cell GC: using the Cel synergistic processor as a garbage collection coprocessor. In VEE '08: Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 141--150. ACM, 2008.

Cited By

View all
  • (2024)Certified SAT solving with GPU accelerated inprocessingFormal Methods in System Design10.1007/s10703-023-00432-z62:1-3(79-118)Online publication date: 1-Jun-2024
  • (2023)Innermost many-sorted term rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910225:COnline publication date: 1-Jan-2023
  • (2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM '12: Proceedings of the 2012 international symposium on Memory Management
June 2012
152 pages
ISBN:9781450313506
DOI:10.1145/2258996
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 11
    ISMM '12
    November 2012
    136 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2426642
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. apu
  2. gpu
  3. mark and sweep
  4. parallel garbage collection
  5. simt

Qualifiers

  • Research-article

Conference

ISMM '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Certified SAT solving with GPU accelerated inprocessingFormal Methods in System Design10.1007/s10703-023-00432-z62:1-3(79-118)Online publication date: 1-Jun-2024
  • (2023)Innermost many-sorted term rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910225:COnline publication date: 1-Jan-2023
  • (2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
  • (2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
  • (2018)A hardware accelerator for tracing garbage collectionProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00022(138-151)Online publication date: 2-Jun-2018
  • (2016)FastCollectProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968520(1-10)Online publication date: 1-Oct-2016
  • (2016)MAMBOACM Transactions on Architecture and Code Optimization10.1145/289645113:1(1-26)Online publication date: 5-Apr-2016
  • (2016)R-GPUACM Transactions on Architecture and Code Optimization10.1145/289050613:1(1-24)Online publication date: 7-Mar-2016
  • (2016)Thread-Aware Adaptive Prefetcher on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/289050513:1(1-25)Online publication date: 28-Mar-2016
  • (2016)Clustering-Based Selection for the Exploration of Compiler Optimization SequencesACM Transactions on Architecture and Code Optimization10.1145/288361413:1(1-28)Online publication date: 28-Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media