Export Citations
No abstract available.
Proceeding Downloads
Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of ...
Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures
Over the past decade, SIMD (single instruction multiple data) or vector architectures have made significant advances, now existing across a wide range of devices from commodity CPUs to high performance computing (HPC) cores. Intel's AVX (Advanced Vector ...
A performance-oriented comparative study of the Chapel high-productivity language to conventional programming environments
The increase in complexity, diversity and scale of high performance computing environments, as well as the increasing sophistication of parallel applications and algorithms call for productivity-aware programming languages for high-performance ...
Beyond worst-case analysis: observed low depth for a P-complete problem
The performance of a simple parallel algorithm for 3CNF Horn SAT is observed. The algorithm requires linear work. The algorithm also exhibits low parallel time (“depth”) for central Horn SAT formulae benchmarks. The work optimality of the algorithm, its ...
Modeling optimization of stencil computations via domain-level properties
Stencil computations are widely used in the scientific simulation domain, and their performance is critical to the overall efficiency of many large-scale numerical applications. Many optimization techniques, most of them varying strategies of tiling and ...
Efficient data race detection of async-finish programs using vector clocks
Existing data race detectors for task-based programs incur significant run time and space overheads. The overheads arise because of frequent lookups in fine-grained tree data structures to check whether two accesses can happen in parallel. This work ...
Integrating a global load balancer to an APGAS distributed collections library
In this paper, we introduce the global load balancer integrated into our distributed collections library for the APGAS for Java programming model. Inspired by the lifeline-based Global Load Balancer scheme, our load balancer makes it possible for ...