Abstract
Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation- and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.
Similar content being viewed by others
References
Chan, Y. K., & Lim, S. Y. (2008). Synthetic aperture radar (SAR) signal generation. Progress In Electromagnetics Research B, 1, 269–290.
Lenart, T., Gustafsson, M., & Owall, V. (2008). A hardware acceleration platform for digital holographic imaging. Journal of Signal Processing System, 52(3), 297–311.
Frigo, M., & Johnson, S. (1998). FFTW: An adaptive software of the FFT. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1381–1384.
Püschel, M., et al. (2005). SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2), 232–275.
Intel Math Kernel Library (MKL). http://software.intel.com/en-us/intel-mkl/.
Intel Integrated Performance Primitives (IPP). http://software.intel.com/en-us/intel-ipp/.
Franchetti, F., et al. (2009). Discrete Fourier transform on multicore. IEEE Signal Processing Magazine, Special Issue on “Signal Processing on Platforms with Multiple Cores”, 26(6), 90–102.
Eleftheriou, M., et al. (2005). Scalable framework for 3D FFTs on the blue gene/l supercomputer: Implementation and early performance measurements. IBM Journal of Research and Development, 49, 457–464.
Fang, B., et al. (2007). Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Computer Physics Communications, 176(8), 531–538.
Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine computation of complex Fourier series. Mathematics of Computation, 19, 297–301.
Yeh, W.-C., & Jen, C.-W. (2003). High-speed and low-power split-radix FFT. IEEE Transactions on Signal Processing, 51, 864–874.
Lin, Y.-W., et al. (2005). A 1-GS/s FFT/IFFT processor for UWB applications. IEEE Journal of Solid-State Circuits, 40, 1726–1735.
PowerFFT ASIC. http://www.eonic.com/index.asp?item=32.
Baas, B. (1999). A low-power, high-performance, 1024-point FFT processor. IEEE Journal OF Solid-state Circuits, 34(3), 380–387.
Uzun, I., Amira, A., & Bouridane, A. (2005). FPGA implementations of fast Fourier transforms for real-time signal and image processing. IEE Proceedings. Vision, Image, and Signal Processing, 152(3), 283–296.
Sasaki, T., et al. (2005). Reconfigurable 3D-FFT processor for the car-parrinello method. The Journal of Computer Chemistry, Japan, 4(4), 147–154.
D’Alberto, P., et al. (2007). Generating FPGA accelerated DFT libraries. In IEEE symposium on field-programmable custom computing machines (FCCM) (pp. 173–184).
Kumhom, P., Johnson, J., & Nagvajara, P. (2000). Design, optimization, and implementation of a universal FFT processor. In IEEE ASIC/SOC conference (pp. 182–186). IEEE.
Milder, P. A., et al. (2008). Formal datapath representation and manipulation for implementing DSP transforms. In Design automation conference (DAC) (pp. 385–390).
Dillon, T. (2001). Two virtex-II FPGAs deliver fastest, cheapest, best high-performance image processing system. Xilinx Xcell Journal, 41, 70–73.
Milder, P. A., et al. (2007). Discrete Fourier transform compiler: From mathematical representation to efficient hardware. Carnegie Mellon University, Tech. Rep. CSSI-07-01.
Van Loan, C. (1992). Computational framework of the fast Fourier transform. Philadelphia, PA: SIAM.
Pitsianis, N. P. (1997). The Kronecker product in approximation and fast transform generation. Dissertation for the degree of Doctor of Philosophy, Cornell University.
Wu, H. R., & Paoloni, F. J. (1989) The structure of vector radix fast Fourier transforms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(9).
FFT Xilinx Logicore. http://www.xilinx.com/products/ipcenter/FFT.htm.
Hard tri-mode MAC. http://www.xilinx.com/products/design_resources/conn_central/protocols/gigabit_ethernet.htm.
Uzun, I., Amira, A., & Bouridane, A. (2005). FPGA implementations of fast Fourier transforms for real-time signal and image processing. IEE Proceedings. Vision, Image, and Signal Processing, 152(3), 283–296.
Elam, D., & Lovescu, C. (2003). A block floating point implementation for an N-point FFT on the TMS320C55X DSP. Application Report SPRA948, Texas Instruments, Dallas, Texas, USA.
Welch, P. (1969). A fixed-point fast Fourier transform error analysis. IEEE Transactions on Audio and Electroacoustics, 17(2), 151–157.
Acknowledgements
This work is supported in part by a grant from DARPA W911NF-05-1-0248. In addition, the authors gratefully acknowledge the help of Dr. Nikos Pitsianis and Dr. Xiaobai Sun of Duke University.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extension of our paper that appeared in SIPS ’09. The added sections are: (1) Impact of large data size on conventional 2D DFT architecture (Section 2.2); (2) Detailed descriptions of the infrastructure components of the FPGA platform (Section 4.2); (3) Detailed description of the automatic 2D DFT system generator (Section 5); (4) Accuracy analysis of the 2D DFT (Section 6.4).
Rights and permissions
About this article
Cite this article
Yu, CL., Kim, JS., Deng, L. et al. FPGA Architecture for 2D Discrete Fourier Transform Based on 2D Decomposition for Large-sized Data. J Sign Process Syst 64, 109–122 (2011). https://doi.org/10.1007/s11265-010-0500-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-010-0500-y