Export Citations
No abstract available.
Proceeding Downloads
Acceleration of the Pre-processing Stage of the MVS Workflow using Graphics Processors
Migrating CPU code to the CUDA programming language has been a challenge for some time. While the code for many high-performance and massively data-parallel applications has been successfully ported to GPUs, this task has received comparatively less ...
Automatic Static Analysis-Guided Optimization of CUDA Kernels
We propose a framework for using static resource analysis to guide the automatic optimization of general-purpose GPU (GPGPU) kernels written in CUDA, NVIDIA's framework for GPGPU programming. In our proposed framework, optimizations are applied to the ...
MUPPET: Optimizing Performance in OpenMP via Mutation Testing
Performance optimization continues to be a challenge in modern HPC software. Existing performance optimization techniques, including profiling-based and auto-tuning techniques, fail to indicate program modifications at the source level thus preventing ...
Parallel Pattern Language Code Generation
Memory and power constraints limit the current landscape of high-performance computing. Hardware specializations in clusters lead to heterogeneity, Non-Uniform Memory Architecture (NUMA) effects, and accelerator offloading. These increase the complexity ...
Pure C++ Approach to Optimized Parallel Traversal of Regular Data Structures
Many computational problems consider memory throughput a performance bottleneck. The problem becomes even more pronounced in the case of parallel platforms, where the ratio between computing elements and memory bandwidth shifts towards computing. ...
Zero-Overhead Parallel Scans for Multi-Core CPUs
We present three novel parallel scan algorithms for multi-core CPUs which do not need to fix the number of available cores at the start, and have zero overhead compared to sequential scans when executed on a single core. These two properties are in ...
Index Terms
- Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores