research-article

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

Authors:
Greg Stitt

University of Florida, Gainesville, FL

University of Florida, Gainesville, FL
View Profile

,
Eric Schwartz

University of Florida, Gainesville, FL

University of Florida, Gainesville, FL
View Profile

,
Patrick Cooke

University of Florida, Gainesville, FL

University of Florida, Gainesville, FL
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 9 Issue 3Article No.: 23pp 1–22https://doi.org/10.1145/2800789

Published:20 May 2016Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Sliding-window applications, an important class of the digital-signal processing domain, are highly amenable to pipeline parallelism on field-programmable gate arrays (FPGAs). Although memory bandwidth often restricts parallelism for many applications, sliding-window applications can leverage custom buffers, referred to as sliding-window generators, that provide massive input bandwidth that far exceeds the capabilities of external memory. Previous work has introduced a variety of sliding-window generators, but those approaches typically generate at most one window per cycle, which significantly restricts parallelism. In this article, we address this limitation with a parallel sliding-window generator that can generate a configurable number of windows every cycle. Although in practice the number of parallel windows is limited by memory bandwidth, we show that even with common bandwidth limitations, the presented generator enables near-linear speedups up to 16x faster than previous FPGA studies that generate a single window per cycle, which were already in some cases faster than graphics-processing units and microprocessors.

References

S. Asano, T. Maruyama, and Y. Yamaguchi. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 126--131. DOI:http://dx.doi.org/10.1109/FPL.2009.5272532Google Scholar
Z. K. Baker, M. B. Gokhale, and J. L. Tripp. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218. DOI:http://dx.doi.org/10.1109/FCCM.2007.52 Google ScholarDigital Library
S. S. Beauchemin and J. L. Barron. 1995. The computation of optical flow. ACM Computing Surveys 27, 3, 433--466. DOI:http://dx.doi.org/10.1145/212094.212141 Google ScholarDigital Library
C. S. S. Burrus and T. W. Parks. 1991. DFT/FFT and Convolution Algorithms: Theory and Implementation. John Wiley & Sons, New York, NY. Google ScholarDigital Library
J. Chase, B. Nelson, J. Bodily, Z. Wei, and D.-J. Lee. 2008. Real-time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 173--182. DOI:http://dx.doi.org/ 10.1109/FCCM.2008.24 Google ScholarDigital Library
Shane Cook. 2013. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
B. Cope, P. Y. K. Cheung, W. Luk, and S. Witt. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118. DOI:http://dx.doi.org/10.1109/FPT.2005.1568533Google Scholar
R. E. Crochiere. 1980. A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 1, 99--102. DOI:http://dx.doi.org/10. 1109/TASSP.1980.1163353Google ScholarCross Ref
Yazhuo Dong, Yong Dou, and Jie Zhou. 2007. Optimized generation of memory structure in compiling window operations onto reconfigurable hardware. In Proceedings of the International Symposium on Applied Reconfigurable Computing. 110--121. Google ScholarDigital Library
Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). ACM, New York, NY, 47--56. DOI:http://dx.doi.org/10.1145/2145694.2145704 Google ScholarDigital Library
Jeremy Fowers, Greg Brown, John Wernsing, and Greg Stitt. 2013. A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. ACM Transactions on Architecture and Code Optimization 9, 4, Article No. 25. DOI:http://dx.doi.org/10.1145/2400682.2400684 Google ScholarDigital Library
Zhi Guo, Betul Buyukkurt, and Walid Najjar. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’04). ACM, New York, NY, 249--256. DOI:http://dx.doi.org/10.1145/997163.997199 Google ScholarDigital Library
Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology.Google Scholar
Nicholas Moore, Miriam Leeser, and Laurie Smith King. 2011. Adaptable two-dimension sliding windows on NVIDIA GPUs with runtime compilation. In Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC’11). IEEE, Los Alamitos, CA, 103--112. DOI:http://dx.doi.org/10.1109/SAAHPC.2011.11 Google ScholarDigital Library
K. Pauwels, M. Tomasi, J. Diaz Alonso, E. Ros, and M. M. Van Hulle. 2012. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers 61, 7, 999--1012. DOI:http://dx.doi.org/10.1109/TC.2011.120 Google ScholarDigital Library
M. Weinhaudt and W. Luk. 2001. Memory access optimisation for reconfigurable systems. IEE Proceedings—Computers and Digital Techniques 148, 3, 105--112. DOI:http://dx.doi.org/10.1049/ip-cdt:20010514Google Scholar
H. Yu and M. Leeser. 2006. Automatic sliding window operation optimization for FPGA-based computing boards. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 76--88. DOI:http://dx.doi.org/10.1109/FCCM.2006.29 Google ScholarDigital Library

Index Terms

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs
1. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits

Recommendations

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by ...
Read More
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to ...
Read More
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 9, Issue 3
Special Issue on Reconfigurable Components with Source Code
September 2016
128 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2940351
Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada /
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2016
- Accepted: 1 June 2015
- Revised: 1 April 2015
- Received: 1 November 2014
Published in trets Volume 9, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
parallelism
pipelining
sliding-window applications
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 350
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media