No abstract available.
Proceeding Downloads
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally ...
KMA: A Dynamic Memory Manager for OpenCL
OpenCL is becoming a popular choice for the parallel programming of both multi-core CPUs and GPGPUs. One of the features missing in OpenCL, yet commonly required in irregular parallel applications, is dynamic memory allocation. In this paper, we propose ...
Efficient Instrumentation of GPGPU Applications Using Information Flow Analysis and Symbolic Execution
Dynamic instrumentation of GPGPU binaries makes possible real-time introspection methods for performance debugging, correctness checks, workload characterization, and runtime optimization. Such instrumentation involves inserting code at the instruction ...
Measuring GPU Power with the K20 Built-in Sensor
GPU-accelerated programs are becoming increasingly common in HPC, personal computers, and even handheld devices, making it important to optimize their energy efficiency. However, accurately profiling the power consumption of GPU code is not ...
Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs
Graphics Processing Units (GPUs) have gained recognition as the primary form of accelerators for graphics rendering in the gaming domain. They have also been widely accepted as the computing platform of choice in many scientific and high performance ...
GLZSS: LZSS Lossless Data Compression Can Be Faster
The need for data compression has grown for better utilization of network bandwidth and data storage space. LZ77 is the most widely used data compression method, which has many variants in practical applications. The biggest obstacle that prevents data ...
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Heap is one of the most important fundamental data structures in computer science. Unfortunately, for a long time heaps did not obtain ideal performance gain from widely used throughput-oriented processors because of two reasons: (1) heap property ...
A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
This paper presents an optimized CPU--GPU hybrid implementation and a GPU performance model for the kernel-independent fast multipole method (FMM). We implement an optimized kernel-independent FMM for GPUs, and combine it with our previous CPU ...
ParallelJS: An Execution Framework for JavaScript on Heterogeneous Systems
JavaScript has been recognized as one of the most widely used script languages. Optimizations of JavaScript engines on mainstream web browsers enable efficient execution of JavaScript programs on CPUs. However, running JavaScript applications on ...
APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation
General-purpose graphics processing units (GPGPU) brings an opportunity to improve the performance for many applications. However, exploiting parallelism is low productive in current programming frameworks such as CUDA and OpenCL. Programmers have to ...
Power Modeling for Heterogeneous Processors
As power becomes an ever more important design consideration, there is a need for accurate power models at all stages of the design process. While power models are available for CPUs and GPUs, only simple models are available for heterogeneous ...
Exploiting GPU Hardware Saturation for Fast Compiler Optimization
Graphics Processing Units (GPUs) are efficient devices capable of delivering high performance for general purpose computation. Realizing their full performance potential often requires extensive compiler tuning. This process is particularly expensive ...
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)
PADS '06: Proceedings of the 20th Workshop on Principles of Advanced and Distributed SimulationGraphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms ...