skip to main content
10.1145/1964179acmotherconferencesBook PagePublication PagesgpgpuConference Proceedingsconference-collections
GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
ACM2011 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
GPGPU-4: Fourth Workshop on General Purpose Processing on Graphics Processing Units Newport Beach California USA 5 March 2011
ISBN:
978-1-4503-0569-3
Published:
05 March 2011
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 07 Mar 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
SESSION: Applications I
research-article
High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs
Article No.: 1, Pages 1–8https://doi.org/10.1145/1964179.1964181

Graphics Processing Units (GPUs) are suitable for highly data parallel algorithms such as image processing, due to their massive parallel processing power. Many image processing applications use the histogramming algorithm, which fills a set of bins ...

research-article
A new method for GPU based irregular reductions and its application to k-means clustering
Article No.: 2, Pages 1–8https://doi.org/10.1145/1964179.1964182

A frequently used method of clustering is a technique called k-means clustering. The k-means algorithm consists of two steps: A map step, which is simple to execute on a GPU, and a reduce step, which is more problematic. Previous researchers have used a ...

SESSION: Optimizations
research-article
Reducing branch divergence in GPU programs
Article No.: 3, Pages 1–8https://doi.org/10.1145/1964179.1964184

Branch divergence has a significant impact on the performance of GPU programs. We propose two novel software-based optimizations, called iteration delaying and branch distribution that aim to reduce branch divergence. Iteration delaying targets a ...

research-article
Register packing for cyclic reduction: a case study
Article No.: 4, Pages 1–6https://doi.org/10.1145/1964179.1964185

We generalize a method for avoiding GPU shared communication when dealing with a downsweep pattern. We apply this generalization to Cyclic Reduction, a tridiagonal solver with this pattern. Previously, Cyclic Reduction suffered poor performance when ...

research-article
Caracal: dynamic translation of runtime environments for GPUs
Article No.: 5, Pages 1–7https://doi.org/10.1145/1964179.1964186

Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. ...

SESSION: Applications II
research-article
Fast Mersenne prime testing on the GPU
Article No.: 6, Pages 1–8https://doi.org/10.1145/1964179.1964188

The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, ...

research-article
Floating-point data compression at 75 Gb/s on a GPU
Article No.: 7, Pages 1–7https://doi.org/10.1145/1964179.1964189

Numeric simulations often generate large amounts of data that need to be stored or sent to other compute nodes. This paper investigates whether GPUs are powerful enough to make real-time data compression and decompression possible in such environments, ...

research-article
Real-time rendering and dynamic updating of 3-d volumetric data
Article No.: 8, Pages 1–8https://doi.org/10.1145/1964179.1964190

A dense 3-d terrain model obtained using reconstruction methods from aerial images is represented in a probabilistic volumetric framework. The choice of probabilistic representation is to represent inherent ambiguity in reconstruction of surface from ...

SESSION: Instrumentation and analysis
research-article
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
Article No.: 9, Pages 1–9https://doi.org/10.1145/1964179.1964192

In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot ...

research-article
Analyzing program flow within a many-kernel OpenCL application
Article No.: 10, Pages 1–8https://doi.org/10.1145/1964179.1964193

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; ...

research-article
Quantifying NUMA and contention effects in multi-GPU systems
Article No.: 11, Pages 1–7https://doi.org/10.1145/1964179.1964194

As system architects strive for increased density and power efficiency, the traditional compute node is being augmented with an increasing number of graphics processing units (GPUs). The integration of multiple GPUs per node introduces complex ...

SESSION: Applications III
research-article
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation
Article No.: 12, Pages 1–8https://doi.org/10.1145/1964179.1964196

We propose a system-independent representation of sparse matrix formats that allows a compiler to generate efficient, system-specific code for sparse matrix operations. To show the viability of such a representation we have developed a compiler that ...

research-article
Unstructured grid applications on GPU: performance analysis and improvement
Article No.: 13, Pages 1–8https://doi.org/10.1145/1964179.1964197

Performance of applications running on GPUs is mainly affected by hardware occupancy and global memory latency. Scientific applications that rely on analysis using unstructured grids could benefit from the high performance capabilities provided by GPUs, ...

Recommendations

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%
YearSubmittedAcceptedRate
GPGPU '2012758%
GPGPU '1915640%
GPGPU-1015853%
GPGPU '1623939%
GPGPU-7271244%
GPGPU-6371541%
Overall1295744%