Export Citations
Graphics cards have long been used to accelerate gaming and 3D graphics applications. More recently, they have begun to be used to accelerate more general purpose and high performance applications. GPUs are beginning to be used to accelerate a wide range of remote sensing, environmental monitoring, business forecasting and medical imaging applications, though have relied on programming interfaces that utilized graphics primitives and libraries. Only recently have general purpose programming environments become available that allow these platforms to be used to accelerate a wider class of applications.
We are pleased to present these 12 high quality papers that were selected for the final program of GPGPU-2. The goal of this workshop is to provide a forum to discuss these general purpose programming environments and platforms, as well as describe successful applications that have leveraged this new approach to acceleration. This year's workshop focuses on a range of applications, though also presents new work in GPU languages and optimization techniques, as well as GPU reliability
Proceeding Downloads
Accelerating cosmological data analysis with graphics processors
In this paper we describe a successful effort to accelerate the two-point angular correlation function---a basic statistics tool used in the field of cosmology to characterize the distribution of the matter and energy in the Universe---by using an ...
High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs
The visualization of molecular orbitals (MOs) is important for analyzing the results of quantum chemistry simulations. The functions describing the MOs are computed on a three-dimensional lattice, and the resulting data can then be used for plotting ...
GPU acceleration of a production molecular docking code
Modeling the interactions of biological molecules, or docking, is critical to both understanding basic life processes and to designing new drugs. Here we describe the GPU-based acceleration of a recently developed, complex, production docking code. We ...
Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA
Optical Quadrature Microscopy (OQM) is a process which uses phase data to capture information about the sample being studied. OQM is part of an imaging framework developed by the Optical Science Laboratory at Northeastern University. In one particular ...
Performance analysis of accelerated image registration using GPGPU
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming environment to take advantage of the parallel processing capabilities of ...
Accelerating linpack with CUDA on heterogenous clusters
This paper describes the use of CUDA to accelerate the Linpack benchmark on heterogenous clusters, where both CPUs and GPUs are used in synergy with minor or no modifications to the original source code. A host library intercepts the calls to DGEMM and ...
hiCUDA: a high-level directive-based language for GPU programming
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the ...
Architecture-aware optimization targeting multithreaded stream computing
Optimizing program execution targeted for Graphics Processing Units (GPUs) can be very challenging. Our ability to efficiently map serial code to a GPU or stream processing platform is a time consuming task and is greatly hampered by a lack of detail ...
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive systems commonly employ QR decomposition to solve overdetermined least ...
3D finite difference computation on GPUs using CUDA
In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implementation for both the stencil-only computation, as well as the discretization ...
Optimization of tele-immersion codes
- Albert Sidelnik,
- I-Jui Sung,
- Wanmin Wu,
- María Jesús Garzarán,
- Wen-mei Hwu,
- Klara Nahrstedt,
- David Padua,
- Sanjay J. Patel
As computational power increases, tele-immersive applications are an emerging trend. These applications make extensive demands on computational resources through their heavy use of real-time 3D reconstruction algorithms. Since computer vision developers ...
Understanding software approaches for GPGPU reliability
Even though graphics processors (GPUs) are becoming increasingly popular for general purpose computing, current (and likely near future) generations of GPUs do not provide hardware support for detecting soft/hard errors in computation logic or memory ...
Recommendations
Algorithmic performance studies on graphics processing units
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear ...