ABSTRACT
Heterogeneous computing is a key strategy to meet the requirements of many compute-intensive applications. However, currently, CPU+FPGA platforms are commonly underutilized as scheduling is often constrained to a run-to-completion model or acceleration of a single application at a time. To tackle this, this paper proposes heterogeneous resource-elastic scheduling for maximizing the utilization of both CPU and FPGA resources by dynamically scaling the resource allocation for tasks transparently. It achieves this for heterogeneous workloads (OpenCL) by selecting the number of compute units, accelerator type and device types using partial reconfiguration and cooperative fine-grained scheduling to maximize system performance based on runtime conditions. We demonstrate as much as 2× better performance as compared to SDSoC-like platforms and, on average, 20% improvement in performance compared to other standard scheduling algorithms while lowering task wait times. Our results indicate that: 1) workload can be executed seamlessly on both CPU and FPGA without increasing programming effort and 2) co-scheduling applications on heterogeneous systems can improve system performance.
- A. Hugo et al. 2013. Composing Multiple StarPU Applications over Heterogeneous Machines: A Supervised Approach. In IPDPS. Google ScholarDigital Library
- A. Munshi. 2009. The OpenCL Specification. In Hot Chips.Google Scholar
- A. Vaishnav et al. 2018. A Survey on FPGA Virtualization. In FPL.Google Scholar
- A. Vaishnav et al. 2018. Resource Elastic Virtualization for FPGAs using OpenCL. In FPL.Google Scholar
- D. Koch et al. 2007. Efficient Hardware Checkpointing: Concepts, Overhead Analysis, and Implementation. In FPGA. Google ScholarDigital Library
- D. Koch et al. 2007. Modeling and Synthesis of Hardware-Software Morphing. In ISCAS.Google Scholar
- G. Joet al. 2014. OpenCL Framework for ARM Processors with NEON Support. In WPMVP '14. 33--40. Google ScholarDigital Library
- H. Simmler et al. 2000. Multitasking on FPGA Coprocessors. In FPL. Google ScholarDigital Library
- J. Cong et al. 2017. CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting. In FPGA. Google ScholarDigital Library
- K. D. Pham et al. 2018. ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications. In FSP.Google Scholar
- L. Wirbel. 2014. Xilinx SDAccel: A Unified Development Environment for Tomorrows Data Center. The Linley Group Inc (2014).Google Scholar
- M. A. D. Guzmán et al. 2019. Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL. The Journal of Supercomputing 75 (2019). Google ScholarDigital Library
- M. Happe et al. 2015. Preemptive Hardware Multitasking in ReconOS. In ARC.Google Scholar
- P. Jääskeläinen et al. 2015. pocl: A Performance-Portable OpenCL Implementation. International Journal of Parallel Programming 43, 5 (2015). Google ScholarDigital Library
- Q. Gautier et al. 2016. Spector: An OpenCL FPGA Benchmark Suite. In FPT.Google Scholar
- V. Kathail et al. 2016. SDSoC: A Higher-level Programming Environment for Zynq SoC and Ultrascale+ MPSoC. In FPGA. Google ScholarDigital Library
Index Terms
- Heterogeneous Resource-Elastic Scheduling for CPU+FPGA Architectures
Recommendations
Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance EngineeringHeterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs and FPGAs for improved performance and energy efficiency. At the same time, programmability is also improving with High Level Synthesis tools (e.g., OpenCL Software ...
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysField-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Comments