Export Citations
This is the fifth year for GPGPU. The pace of adoption of GPUs for both high-performance and general-purpose computing domains is accelerating. The class of applications being migrated to these devices is quickly expanding, and with the promise of new platforms on the horizon, we expect this growth to continue. The introduction of AMD Fusion and Intel SandyBridge fused CPU/GPU systems is just another catalyst that will accelerate the adoption of GPUs in the near future.
This year we received 35 high quality submissions. We are pleased to present these 13 high quality papers that were selected for the final program of GPGPU-5. The goal of this workshop is to provide a forum to discuss these general purpose programming environments and platforms, as well as describe successful applications that have leveraged this approach to acceleration. This year's workshop focuses on a range of new exciting applications, as well as new programming tools and language extensions.
Proceeding Downloads
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, ...
A distributed data-parallel framework for analysis and visualization algorithm development
The coming generation of supercomputing architectures will require fundamental changes in programming models to effectively make use of the expected million to billion way concurrency and thousand-fold reduction in per-core memory. Most current parallel ...
FLAT: a GPU programming framework to provide embedded MPI
For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming ...
A GPU-based high-throughput image retrieval algorithm
With the development of Internet and cloud computing, multimedia data, such as images and videos, has become one of the most common data types being processed. As the scale of multimedia data being still increasing, it is vitally important to ...
Dynamic particle system for mesh extraction on the GPU
Extracting isosurfaces represented as high quality meshes from three-dimensional scalar fields is needed for many important applications, particularly visualization and numerical simulations. One recent advance for extracting high quality meshes for ...
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations
In this paper, we address efficient sparse matrix-vector multiplication for matrices arising from structured grid problems with high degrees of freedom at each grid node. Sparse matrix-vector multiplication is a critical step in the iterative solution ...
High performance 3-D FFT using multiple CUDA GPUs
Fast Fourier transform is one of the most important computations used in many kinds of applications. Although there are several works of on single GPU FFT, we also need large-scale transforms that require multiple GPUs due to the capacity of the device ...
Paragon: collaborative speculative loop execution on GPU and CPU
The rise of graphics engines as one of the main parallel platforms for general purpose computing has ignited a wide search for better programming support for GPUs. Due to their non-traditional execution model, developing applications for GPUs is usually ...
JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units
There is an increasing interest from software developers in executing Java and .NET bytecode programs on General Purpose Graphics Processor Units (GPGPUs). Existing solutions have limited support for operations on objects and often require explicit ...
Enabling task-level scheduling on heterogeneous platforms
OpenCL is an industry standard for parallel programming on heterogeneous devices. With OpenCL, compute-intensive portions of an application can be offloaded to a variety of processing units within a system. OpenCL is the first standard that focuses on ...
Auto-tuning interactive ray tracing using an analytical GPU architecture model
This paper presents a method for auto-tuning interactive ray tracing on GPUs using a hardware model. Getting full performance from modern GPUs is a challenging task. Workloads which require a guaranteed performance over several runs must select ...
Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting
Modern system-on-chips are evolving towards complex and heterogeneous platforms with general purpose processors coupled with massively parallel manycore accelerator fabrics (e.g. embedded GPUs). Platform developers are looking for efficient full-system ...
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
The performance of General Purpose Graphics Processing Units (GPGPUs) is frequently limited by the off-chip memory bandwidth. To mitigate this bandwidth wall problem, recent GPUs are equipped with on-chip L1 and L2 caches. However, there has been little ...
- Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Recommendations
Algorithmic performance studies on graphics processing units
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear ...