skip to main content
research-article

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

Published:20 May 2016Publication History
Skip Abstract Section

Abstract

Sliding-window applications, an important class of the digital-signal processing domain, are highly amenable to pipeline parallelism on field-programmable gate arrays (FPGAs). Although memory bandwidth often restricts parallelism for many applications, sliding-window applications can leverage custom buffers, referred to as sliding-window generators, that provide massive input bandwidth that far exceeds the capabilities of external memory. Previous work has introduced a variety of sliding-window generators, but those approaches typically generate at most one window per cycle, which significantly restricts parallelism. In this article, we address this limitation with a parallel sliding-window generator that can generate a configurable number of windows every cycle. Although in practice the number of parallel windows is limited by memory bandwidth, we show that even with common bandwidth limitations, the presented generator enables near-linear speedups up to 16x faster than previous FPGA studies that generate a single window per cycle, which were already in some cases faster than graphics-processing units and microprocessors.

References

  1. S. Asano, T. Maruyama, and Y. Yamaguchi. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 126--131. DOI:http://dx.doi.org/10.1109/FPL.2009.5272532Google ScholarGoogle Scholar
  2. Z. K. Baker, M. B. Gokhale, and J. L. Tripp. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218. DOI:http://dx.doi.org/10.1109/FCCM.2007.52 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. S. Beauchemin and J. L. Barron. 1995. The computation of optical flow. ACM Computing Surveys 27, 3, 433--466. DOI:http://dx.doi.org/10.1145/212094.212141 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. S. S. Burrus and T. W. Parks. 1991. DFT/FFT and Convolution Algorithms: Theory and Implementation. John Wiley & Sons, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chase, B. Nelson, J. Bodily, Z. Wei, and D.-J. Lee. 2008. Real-time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 173--182. DOI:http://dx.doi.org/ 10.1109/FCCM.2008.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shane Cook. 2013. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Cope, P. Y. K. Cheung, W. Luk, and S. Witt. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118. DOI:http://dx.doi.org/10.1109/FPT.2005.1568533Google ScholarGoogle Scholar
  8. R. E. Crochiere. 1980. A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 1, 99--102. DOI:http://dx.doi.org/10. 1109/TASSP.1980.1163353Google ScholarGoogle ScholarCross RefCross Ref
  9. Yazhuo Dong, Yong Dou, and Jie Zhou. 2007. Optimized generation of memory structure in compiling window operations onto reconfigurable hardware. In Proceedings of the International Symposium on Applied Reconfigurable Computing. 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). ACM, New York, NY, 47--56. DOI:http://dx.doi.org/10.1145/2145694.2145704 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeremy Fowers, Greg Brown, John Wernsing, and Greg Stitt. 2013. A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. ACM Transactions on Architecture and Code Optimization 9, 4, Article No. 25. DOI:http://dx.doi.org/10.1145/2400682.2400684 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zhi Guo, Betul Buyukkurt, and Walid Najjar. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’04). ACM, New York, NY, 249--256. DOI:http://dx.doi.org/10.1145/997163.997199 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology.Google ScholarGoogle Scholar
  14. Nicholas Moore, Miriam Leeser, and Laurie Smith King. 2011. Adaptable two-dimension sliding windows on NVIDIA GPUs with runtime compilation. In Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC’11). IEEE, Los Alamitos, CA, 103--112. DOI:http://dx.doi.org/10.1109/SAAHPC.2011.11 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Pauwels, M. Tomasi, J. Diaz Alonso, E. Ros, and M. M. Van Hulle. 2012. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers 61, 7, 999--1012. DOI:http://dx.doi.org/10.1109/TC.2011.120 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Weinhaudt and W. Luk. 2001. Memory access optimisation for reconfigurable systems. IEE Proceedings—Computers and Digital Techniques 148, 3, 105--112. DOI:http://dx.doi.org/10.1049/ip-cdt:20010514Google ScholarGoogle Scholar
  17. H. Yu and M. Leeser. 2006. Automatic sliding window operation optimization for FPGA-based computing boards. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 76--88. DOI:http://dx.doi.org/10.1109/FCCM.2006.29 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 9, Issue 3
      Special Issue on Reconfigurable Components with Source Code
      September 2016
      128 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/2940351
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2016
      • Accepted: 1 June 2015
      • Revised: 1 April 2015
      • Received: 1 November 2014
      Published in trets Volume 9, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader