ABSTRACT
A hardware module called merge network is the key module for constructing FPGA-based sorting accelerators. Therefore, we propose a novel merge network based on compare and swap operations for high performance sorting accelerators. Our proposal is based on the state-of-the-art merge network and it tries to mitigate its drawback that the maximum wiring delay and the maximum fanout increase when the number of records output per cycle is increased.
We implement some merge networks adopting the proposal on a Virtex-7 FPGA. The evaluation results show that the maximum fanout of the proposal is constant, and the maximum wiring delay of the proposal is almost constant. Because of these desirable properties, the proposal of the largest configuration achieves 1.43x higher throughput than the state-of-the-art merge network.
- Jared Casper and Kunle Olukotun. 2014. Hardware Acceleration of Database Operations. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA '14). ACM, New York, NY, USA, 151--160. Google ScholarDigital Library
- Minsik Cho, Daniel Brand, Rajesh Bordawekar, Ulrich Finkler, Vincent Kulandaisamy, and Ruchir Puri. 2015. PARADIS: An Efficient Parallel Algorithm for In place Radix Sort. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1518--1529. Google ScholarDigital Library
- Andrew Davidson, David Tarjan, Michael Garland, and John D. Owens. 2012. Efficient parallel merge sort for fixed and variable length keys. In 2012 Innovative Parallel Computing (InPar). IEEE, 1--9.Google Scholar
- Hiroshi Inoue and Kenjiro Taura. 2015. SIMD- and Cache-friendly Algorithm for Sorting an Array of Structures. Proc. VLDB Endow. 8, 11 (July 2015), 1274--1285. Google ScholarDigital Library
- Dirk Koch and Jim Torresen. 2011. FPGASort: A High Performance Sorting Architecture Exploiting Run-time Reconfiguration on Fpgas for Large Problem Sorting. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). ACM, New York, NY, USA, 45--54. Google ScholarDigital Library
- Susumu Mashimo, Thiem Van Chu, and Kenji Kise. 2017. High-Performance Hardware Merge Sorter. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 1--8.Google Scholar
- Duane Merrill and Andrew Grimshaw. 2011. High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters (PPL) 21, 02 (2011), 245--272.Google ScholarCross Ref
- Makoto Saitoh, Elsayed A. Elsayed, Thiem Van Chu, Susumu Mashimo, and Kenji Kise. 2018. High-Performance and Cost-Effective Hardware Merge Sorter without Feedback Datapath. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 197--204.Google ScholarCross Ref
- Wei Song, Dirk Koch, Mikel Luján, and Jim Garside. 2016. Parallel Hardware Merge Sorter. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 95--102.Google Scholar
Recommendations
Cost-Effective and High-Throughput Merge Network: Architecture for the Fastest FPGA Sorting Accelerator
HEART '16High-performance sorting is used in various areas such as database transactions and genomic feature operations. To improve sorting performance, in addition to the conventional approach of using general purpose processors or GPUs, the approach of using ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Implementing high-performance, low-power FPGA-based optical flow accelerators in C
ASAP '13: Proceedings of the 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)Recent developments in High-Level Synthesis (HLS) for FPGAs are making it possible to “run” C code on FPGAs thereby making modern programming environments available to FPGA developers. In this paper, C code for a complex optical-flow algorithm is ...
Comments