Loading [a11y]/accessibility-menu.js
A Scalable Distributed Radix Sorter for FPGA Clusters using High-Bandwidth Memory Networks | IEEE Conference Publication | IEEE Xplore

A Scalable Distributed Radix Sorter for FPGA Clusters using High-Bandwidth Memory Networks


Abstract:

A modern FPGA card can be equipped with high bandwidth memory, such as HBM2. Since the amount of the memory is limited on an FPGA, highly parallel data processing becomes...Show More

Abstract:

A modern FPGA card can be equipped with high bandwidth memory, such as HBM2. Since the amount of the memory is limited on an FPGA, highly parallel data processing becomes crucial on tightly coupled FPGAs by high-density optical integration, e.g., onboard Si-Photonics transceivers. This study presents a scalable distributed radix sorter, and implements it on an eight-FPGA cluster. Each custom Stratix10 MX2100 FPGA card has 819-Gbps memory bandwidth with two HBM2 memories and 800-Gbps network bandwidth with eight custom embedded optical modules. Existing FPGA sorter typically relies on a merge sort. However, it has a severe performance bottleneck at the final stage of data merge, which cannot make the best use of the high memory-to-memory bandwidth on the FPGA cluster. Instead, we implement a radix sort for a 32-bit key range consisting of eight 4-bit counting sorts optimized to the memory-network structure. Each counting sort needs memory read/write access only once through global and local pipelines. We demonstrated a sorting throughput of 37.2 GB/s.
Date of Conference: 15-18 May 2022
Date Added to IEEE Xplore: 03 June 2022
ISBN Information:

ISSN Information:

Conference Location: New York City, NY, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.