A Low-Memory-Access Length-Adaptive Architecture for 2 $$^n$$ -Point FFT

Chen, Kuan-Hung

doi:10.1007/s00034-014-9862-x

A Low-Memory-Access Length-Adaptive Architecture for 2$^n$-Point FFT

Published: 12 August 2014

Volume 34, pages 459–482, (2015)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Kuan-Hung Chen¹

328 Accesses
5 Citations
Explore all metrics

Abstract

Fast Fourier transformation (FFT) is widely used in modern wireless communication and digital signal processing. Because memory access is a major cause of power dissipated by the long-length FFT architecture, this paper explores the design space expanded by FFT size and radix number in detail and presents a novel low-memory-access length-adaptive architecture for computing any long-length 2$^n$-point FFT. The proposed hardware solution possesses the following three attractive features to reflect its novelty as compared to the existing designs. First, the authors identified that memory consumes major energy dissipation of a FFT processor and proposed to reduce memory access through decreasing the number of FFT butterfly stages. The second one is that we adopt the design concept of programmable processors to provide the flexibility in dynamically configuring the hardware for computing variable-length FFT without sacrificing the hardware utilization as contrary to the feed-forward architecture. Finally, a 16-bank memory organization is proposed to achieve conflict-free FFT operations for various radixes. Such low-memory-access length-adaptive architecture can reduce almost 70 % memory access or 30 % power consumption for FFT computation. After being implemented through 1P6M TSMC 0.18-$\upmu $m CMOS technology, this work costs a core area of only 4.49 mm$^{2}$ and meets the FFT real-time performance requirements of DVB-T2 systems when operated at 20 MHz frequency. The proposed design consumes only 1.44 nJ of energy per sample for computing FFTs. Through adopting the proposed low-memory-access algorithm, flexible length-adaptive architecture, and efficient 16-bank memory organization, 56 % power dissipation of the whole FFT chip can be saved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

Article 31 May 2023

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Article 10 April 2024

References

Artisan Component, TSMC 0.18-$\mu $m process 1.8-volt SAGE-X standard cell library, Databook (2003)
B.M. Baas, A low-power, high-performance 1,024-point FFT processor. IEEE J. Solid-State Circuits 34(3), 380–387 (1999)
Article Google Scholar
V. Baireddy, H. Khasnis, R. Mundhada, A 64–4,096 point FFT/IFFT/windowing processor for multi-standard ADSL/VDSL applications, in Proceedings of the IEEE International Symposium on Signals, Systems and Electronics (2007), pp. 403–405
G. Bi, E.V. Jones, A pipelined FFT processor for word-sequential data. IEEE Trans. Acoust. Speech Signal Process. 37(12), 1982–1985 (1989)
Article Google Scholar
E. Bidet, D. Castelain, C. Joanblanq, P. Senn, A fast single-chip implementation of 8,192 complex point FFT. IEEE J. Solid-State Circuits 30(3), 300–305 (1995)
Article Google Scholar
A.P. Chandrakasan, R.W. Brodersen, Low power digital CMOS design (Kluwer Academic Publishers, Boston, 1995)
Book Google Scholar
C.K. Chang, C.P. Hung, S.G. Chen, An efficient memory-based FFT architecture. Proc. IEEE Int. Symp. Circuits Syst. 2, 129–132 (2003)
Google Scholar
L.F. Chen, L.C. Chien, Y.H. Ma, C.H. Lee, Y.W. Lin, C.C. Lin, H.Y. Lin, T.Y. Hsu, C.Y. Lee, A 1.8 V 250 mW COFDM baseband receiver for DVB-T/H applications, in Proceedings of the IEEE International Solid-State Circuits Conference (2006), pp. 262–263
K.H. Chen, Y.S. Chu, A spurious-power suppression technique for multimedia/DSP applications. IEEE Trans. Circuits Syst. I 56(1), 132–143 (2009)
Article MathSciNet Google Scholar
K.H. Chen, Y.S. Li, A multi-radix FFT processor using pipeline in memory-based architecture (PIMA) for DVB-T/H systems, in Proceedings of the IEEE International Mixed Design of Integrated Circuits and Systems (2008), pp. 549–554.
Y. Chen, Y.C. Tsao, Y.W. Lin, C.H. Lin, C.Y. Lee, An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications. IEEE Trans. Circuits Syst. II 55(2), 146–150 (2008)
Article Google Scholar
J.W. Cooley, J.W. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comput. 5(5), 87–109 (1965)
MathSciNet Google Scholar
ETSI, Digital video broadcasting (DVB); Framing structure, channel coding and modulation for digital terrestrial television, ETSI EN 300 744 v1.5.1 (2004)
ETSI, Digital video broadcasting (DVB); transmission systems for handheld terminals (DVB-H), ETSI EN 302 304 v1.1.1 (2004)
J.I. Guo, C.M. Liu, C.W. Jen, The efficient memory-based VLSI array designs for DFT and DCT. IEEE Trans. Circuits Syst. II 39(10), 723–733 (1992)
Article MATH Google Scholar
S. He, M. Torkelson, Designing pipeline FFT processor for OFDM (de)modulation, in Proceedings of the IEEE International Symposium on Signals, Systems and Electronics (1998), pp. 257–262
S.J. Huang, S.G. Chen, A high-throughput radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems. IEEE Trans. Circuits Syst. I 59(8), 1752–1765 (2012)
Article MathSciNet Google Scholar
C. L. Hung, S. S. Long, and M. T. Shiue, A low-power and variable-length FFT design for flexible MIMO OFDM systems, Proceedings of the IEEE International Symposium on Circuits and Systems (2009), pp. 705–708
L. Jia, Y. Gao, J. Isoaho, H. Tenhunen, A new VLSI-oriented FFT algorithm and implementation, in Proceedings of the IEEE ASIC Conference (1998), pp. 337–341
M. Keating, P. Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs (Kluwer Academic Publishers, Dordrecht, 2002)
Google Scholar
H.Y. Lee, Y.C. Park, Balanced binary-tree decomposition for area-efficient pipelined FFT processing. IEEE Trans. Circuits Syst. I 54(4), 889–900 (2007)
Article Google Scholar
H. Lee, M. Shin, A high-speed low-complexity two-parallel radix-2$^{4}$ FFT/IFFT processor for UWB applications, in Proceedings of IEEE Asian Solid-State Circuits Conference (2007), pp. 284–287
W. Li, L. Wanhammar, A pipeline FFT processor, in Proceedings of the IEEE Workshop on Signal Processing Systems (1999), pp. 654–662
Y.W. Lin, H.Y. Liu, C.Y. Lee, A dynamic scaling FFT processor for DVB-T applications. IEEE J. Solid-State Circuits 39(11), 2005–2013 (2004)
Article Google Scholar
Y.W. Lin, H.Y. Liu, C.Y. Lee, A 1-GS/s FFT/IFFT processor for UWB applications. IEEE J. Solid-State Circuits 40(8), 1726–1735 (2005)
Article Google Scholar
S.Y. Lin, C.L. Wei, M.D. Shieh, Low-cost FFT processor for DVB-T2 applications. IEEE Trans. Consum. Electron. 56(4), 2072–2079 (2010)
Article Google Scholar
S. Magar, S. Shen, G. Luikuo, M. Fleming, R. Aguilar, An application specific DSP chip set for 100 MHz data rate. Proc. Int. Conf. Acoust. Speech Signal Process. 4, 1989–1992 (1988)
Google Scholar
K. Maharatna, E. Grass, U. Jagdhold, A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM. IEEE J. Solid-State Circuits 39(3), 484–493 (2004)
Article Google Scholar
N. Miyamoto, L. Karnan, K. Maruo, K. Kotani, T. Ohmi, A small-area high-performance 512-point 2-dimensional FFT single-chip processor, in Proceedings of the IEEE European Solid-State Circuits Conference (2003), pp. 603–606
K.K. Parhi, VLSI Digital Signal Processing Systems (Wiley-Interscience Publication, New York, 1999)
Google Scholar
A.A. Petrovsky, S.L. Shkredov, Automatic generation of split-radix 2–4 parallel-pipeline FFT processors: hardware reconfiguration and core optimization, in Proceedings of the IEEE International Symposium on Parallel Computing Electrical Engineering (2006), pp. 181–186
S. Qiao, Y. Hei, B. Wu, Y. Zhou, An area and power efficient FFT processor for UWB systems, in Proceedings of the IEEE Conference on Wireless Communications, Networking and Mobile Computing (2007), pp. 582–585
Virtual silicon preliminary data sheet on single-port/dual-port/two-port SRAM compiler for UMC 0.18 $\mu $m (L180GII) (2004), pp. 1–3
C. Wang, W.S. Gan, C.C. Jong, J. Luo, A low-cost 256-point FFT processor for portable speech and audio applications, in Proceedings of the IEEE International Symposium on Integrated Circuits (2007), pp. 81–84
C.C. Wang, J.M. Huang, H.C. Cheng, A 2k/8k mode small-area FFT processor for OFDM demodulation of DVB-T receivers. IEEE Trans. Consum. Electron. 51(1), 28–32 (2005)
Article Google Scholar
C.L. Wey, W.C. Tang, S.Y. Lin, Efficient memory-based architectures for digital video broadcasting automation and test, in Proceedings of the IEEE International Symposium VLSI Design (2007), pp. 1–4
W.C. Yeh, C.W. Jen, High-speed and low-power split-radix FFT. IEEE Trans. Signal Process. 51(3), 864–874 (2003)
Article MathSciNet Google Scholar

Download references

Acknowledgments

The author would like to thank National Chip Implementation Center (CIC) of Taiwan for the help on chip fabrication and measuring.

Author information

Authors and Affiliations

Department of Electronic Engineering, Feng-Chia University, 100, Wenhwa Rd., Seatwen, Tai-chung , 40724, Taiwan, ROC
Kuan-Hung Chen

Authors

Kuan-Hung Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuan-Hung Chen.

Appendix: Fetching Data for Radix-2, Radix-4, and Radix-8 FFTs

Addressing modes are given for fetching data from the memory into the data-path for FFT operation based on radix-2, radix-4, and radix-8, respectively. The first one is for computing radix-2 FFT in which $x(n)$ are fetched from the following address Addr[$x(n)$] inside the memory bank Bank[$x(n)$] of memory to the FFT data-path.

$$\begin{aligned}&{\text {Addr}}\left[ {x( n)} \right] =( {\# {\text {cycle}}}) \hbox {mod } ({N/16}) \end{aligned}$$

(22)

$$\begin{aligned}&{\text {Banks}}\left[ {x( n)} \right] =\,\,\,\left\lfloor {\frac{\# {\text {cycle}}}{N/16}} \right\rfloor \quad {\text {and}} \quad \left\lfloor {\frac{\# {\text {cycle}}}{N/16}} \right\rfloor \,+\,8 \end{aligned}$$

(23)

Besides, the partial timing diagram is shown as Fig. 11. From the figure, we can find that the numbers of a pair of banks are kept unchanged for 512 cycles, and the address number increments every clock cycle. After all the data of the two acting banks are fetched, the bank numbers increment to continue fetching data from the next pair of banks.

The second one is for computing radix-4 FFT where $x(n)$ are fetched from the following locations of memory to the FFT data-path for computing.

$$\begin{aligned} {\text {Addr}}\left[ {x( n)} \right]&=\,\,\,\left\lfloor {\frac{(\# {\text {cycle}}) \, \mathrm{{mod}} \, (N/16)}{2}} \right\rfloor \end{aligned}$$

(24)

$$\begin{aligned} {\text {Banks}}\left[ {x( n)} \right]&=\,\left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \cdot {\overline{(\# {\text {cycle}}) \, {\text {mod}} \, (2),}} \left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;8\right) \nonumber \\&\quad \cdot \,{\overline{(\# {\text {cycle}}){\text {mod}} \,(2),}}\left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;4\right) \cdot \left[ ( {\# {\text {cycle}}}){\text {mod}} \, (2)\right] ,\nonumber \\&\quad \left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;12\right) \cdot \,[\,( {\# {\text {cycle}}}){\text {mod}} \, (2)] \end{aligned}$$

(25)

The partial timing diagram of data fetching for FFT operation based on radix-4 algorithm is shown as Fig. 12. From the figure, we can find that two pairs of banks form a basic unit and the data inside are fetched in order during 1,024 cycles. Thus, the address number increments every two clock cycles. After all the data of the acting unit are fetched, the bank numbers increment to continue fetching data from the next unit.

Furthermore, for calculating radix-8 FFT, $x(n)$ are fetched from the locations of memory as shown in Fig. 13 to the FFT data-path for computing. From the figure, we can find that four pairs of banks form a basic unit and the data inside are fetched in order during 2,048 cycles. Thus, the address number increments every four clock cycles. After all the data of the acting unit are fetched, the bank numbers increment to continue fetching data from the next unit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, KH. A Low-Memory-Access Length-Adaptive Architecture for 2$^n$-Point FFT. Circuits Syst Signal Process 34, 459–482 (2015). https://doi.org/10.1007/s00034-014-9862-x

Download citation

Received: 17 January 2014
Revised: 10 July 2014
Accepted: 10 July 2014
Published: 12 August 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s00034-014-9862-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Low-Memory-Access Length-Adaptive Architecture for 2\(^n\)-Point FFT

Abstract

Access this article

Similar content being viewed by others

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

A Survey on Pipelined FFT Hardware Architectures

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

References

Acknowledgments