Abstract
We present a novel 4096 complex-point, fully systolic VLSI FFT architecture based on the combination of three consecutive radix-4 stages resulting in a 64-point FFT engine. The outcome of cascading these 64-point FFT engines is an improved architecture that efficiently processes large input data sets in real time. Using 64-point FFT engines reduces the buffering and the latency to one third of a fully unfolded radix-4 architecture, while the radix-4 schema simplifies the calculations within each engine. The proposed 4096 complex point architecture has been implemented on a FPGA achieving a post-route clock frequency of 200 MHz resulting in a sustained throughput of 4096 point/20.48 μs. It has also been implemented on a high performance 0.13 μm, 1P8M CMOS process achieving a worst-case (0.9 V, 125 C) post-route clock frequency of 604.5 MHz and a sustained throughput of 4096 point/3.89 μs while consuming 4.4 W. The architecture is extended to accomplish FFT computations of 16K, 64K and 256K complex points with 352, 256 and 188 MHz operating frequencies respectively.
Similar content being viewed by others
References
Ersoy, O. K. (1997). Fourier-related transforms, fast algorithms and applications. Englewood Cliffs: Prentice Hall.
Thompson, C. D. (1983). Fourier transform in VLSI. IEEE Transactions on Computers, 32, 1047–1057.
Wold, E. H., & Despain, A. M. (1984). Pipeline and parallel FFT processors for VLSI implementations. IEEE Transactions on Computers, C-33, 414-426.
He, S., & Torkelson, M. (1996). A new approach to pipeline FFT processor. In Proceedings of the IPPS.
Choi, S., Govindu, G., Jang, J. W., & Prasanna, V. K. (2003). Energy-efficient and parameterized designs of fast fourier transforms on FPGAs. In The 28th international conference on acoustics, speech, and signal processing (ICASSP).
Uzun, I. S., Amira, A., & Bouridane, A. (2005). FPGA implementations of fast fourier transforms for real-time signal and image processing. IEEE Vision, Image and Signal Processing, 152, 283–296.
Oppenheim, A., & Schafer, R. (1975). Digital signal processing. Englewood Cliffs: Prentice Hall.
Lee, J., Lee, J., Sunwoo, M. H., Moh, S., & Oh, S. (2002). A DSP architecture for high-speed FFT in OFDM systems. ETRI Journal, 24, 391–397.
He, S., & Torkelson, M. (1998). Design and implementation of a 1024-point pipeline FFT processor. In IEEE 1998 Custom integrated circuits.
Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs: Prentice-Hall.
Suter, B., & Stevens, K. S. (1998). A low power, high performance approach for time-frequency / time-scale computations. In Proceedings SPIE98 conference on advanced signal processing algorithms, architectures and implementations VIII (Vol. 3461, pp. 86–90).
Lenart, T., & Owall, V. (2003). A 2048 complex point FFT processor using a novel data scaling approach. In IEEE ISCAS.
Cortes, A., Velez, I., Zalbide, I., Irizar, A., & Sevillano, J. F. (2006). An FFT core for DVB-T/DVB-H receivers. In ICECS’06 (pp. 102–105).
Maharatna, K., Grass, E., & Jagdhold, U. (2004). A 64-point fourier transform chip for high-speed wireless LAN applications using OFDM. IEEE Journal of Solid-State Circuits, 39(3), 484–493.
Oh, J. Y., & Lim, M. S. (2005). New radix-2 to the 4th power pipeline FFT processor. IEICE Transactions on Electronics, E88-C(8), 1740–1746.
Bouguezel, S., Ahmad, M. O., & Swamy, M. N. S. (2004). A new radix-2/8 FFT algorithm for length− q×2m DFTs. IEEE Transactions on Circuits and Systems I, 51(9), 1723–1732.
Jo, B. G., & Sunwoo, M. H. (2005). New cotinuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy. IEEE Transactions on Circuits and Systems I, 52(5), 911–919.
Bouguezel, S., Ahmad, M. O., & Swamy, M. N. S. (2006). New radix-(2×2×2)/(4×4×4) and radix-(2×2×2)/(8×8×8) DIF FFT algorithms for 3-D DFT. IEEE Transactions on Circuits and Systems I, 53(2), 306—315.
Chang, W. H., & Nguyen, T. (2006). An OFDM-specified lossless FFT architecture. IEEE Transactions on Circuits and Systems I, 53(6), 1235–1243.
Yang, L., Zhang, K., Liu, H., Huang, J., & Huang, S. (2006). An efficient locally pipelined FFT processor. IEEE Transactions on Circuits and Systems II, 53(7), 585–589.
Lin, Y. N., Liu, H. Y., & Lee, C. Y. (2005). A 1-GS/s FFT/IFFT processor for UWB applications. IEEE Journal of SSC, 40(8).
Takala, J., & Punkka, K. (2006). Scalable FFT processors and pipelined butterfly units. Journal of VLSI Signal Processing, 43, 113–123.
Wang, S. S., & Li, C. S. (2007). An area-efficient design of variable-length fast Fourier transform processor. Journal of VLSI Signal Processing.
Reisis, D., & Vlassopoulos, N. (2006). Address generation techniques for conflict free parallel memory accessing in FFT architectures. In ICECS (1188–1191), December.
Bidet, E., Castelain, D., Joanblanq, C., & Stenn, P. (1995). A fast single-chip implementation of 8192 complex point FFT. IEEE Journal of SSC, 30(3), 300–305.
Swartzlander, E. E. Jr. (2007). Systolic FFT processors: A personal perspective. Journal of VLSI Signal Processing, 53, 3–14.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Babionitakis, K., Chouliaras, V.A., Manolopoulos, K. et al. Fully Systolic FFT Architecture for Giga-sample Applications. J Sign Process Syst Sign Image Video Technol 58, 281–299 (2010). https://doi.org/10.1007/s11265-009-0364-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0364-1