Export Citations
No abstract available.
Proceeding Downloads
A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms for the inversion of dense matrices via Gauss-Jordan elimination. Our algorithms perform a partitioning of the matrix operand into two levels of tasks: ...
CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks
Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved great progress in many real-life applications. Meanwhile, due to the complex model structures against strict latency and memory restriction, the implementation of CNN models on the ...
Porting and Evaluation of a Distributed Task-driven Stencil-based Application
Alternative programming models and runtimes are increasing in popularity and maturity. This allows porting and comparing, on competitive grounds, emerging parallel approaches against the traditional MPI+X paradigm. In this work, an implementation of ...