Skip to main content
Log in

A Low-Memory-Access Length-Adaptive Architecture for 2\(^n\)-Point FFT

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Fast Fourier transformation (FFT) is widely used in modern wireless communication and digital signal processing. Because memory access is a major cause of power dissipated by the long-length FFT architecture, this paper explores the design space expanded by FFT size and radix number in detail and presents a novel low-memory-access length-adaptive architecture for computing any long-length 2\(^n\)-point FFT. The proposed hardware solution possesses the following three attractive features to reflect its novelty as compared to the existing designs. First, the authors identified that memory consumes major energy dissipation of a FFT processor and proposed to reduce memory access through decreasing the number of FFT butterfly stages. The second one is that we adopt the design concept of programmable processors to provide the flexibility in dynamically configuring the hardware for computing variable-length FFT without sacrificing the hardware utilization as contrary to the feed-forward architecture. Finally, a 16-bank memory organization is proposed to achieve conflict-free FFT operations for various radixes. Such low-memory-access length-adaptive architecture can reduce almost 70 % memory access or 30 % power consumption for FFT computation. After being implemented through 1P6M TSMC 0.18-\(\upmu \)m CMOS technology, this work costs a core area of only 4.49 mm\(^{2}\) and meets the FFT real-time performance requirements of DVB-T2 systems when operated at 20 MHz frequency. The proposed design consumes only 1.44 nJ of energy per sample for computing FFTs. Through adopting the proposed low-memory-access algorithm, flexible length-adaptive architecture, and efficient 16-bank memory organization, 56 % power dissipation of the whole FFT chip can be saved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Artisan Component, TSMC 0.18-\(\mu \)m process 1.8-volt SAGE-X standard cell library, Databook (2003)

  2. B.M. Baas, A low-power, high-performance 1,024-point FFT processor. IEEE J. Solid-State Circuits 34(3), 380–387 (1999)

    Article  Google Scholar 

  3. V. Baireddy, H. Khasnis, R. Mundhada, A 64–4,096 point FFT/IFFT/windowing processor for multi-standard ADSL/VDSL applications, in Proceedings of the IEEE International Symposium on Signals, Systems and Electronics (2007), pp. 403–405

  4. G. Bi, E.V. Jones, A pipelined FFT processor for word-sequential data. IEEE Trans. Acoust. Speech Signal Process. 37(12), 1982–1985 (1989)

    Article  Google Scholar 

  5. E. Bidet, D. Castelain, C. Joanblanq, P. Senn, A fast single-chip implementation of 8,192 complex point FFT. IEEE J. Solid-State Circuits 30(3), 300–305 (1995)

    Article  Google Scholar 

  6. A.P. Chandrakasan, R.W. Brodersen, Low power digital CMOS design (Kluwer Academic Publishers, Boston, 1995)

    Book  Google Scholar 

  7. C.K. Chang, C.P. Hung, S.G. Chen, An efficient memory-based FFT architecture. Proc. IEEE Int. Symp. Circuits Syst. 2, 129–132 (2003)

    Google Scholar 

  8. L.F. Chen, L.C. Chien, Y.H. Ma, C.H. Lee, Y.W. Lin, C.C. Lin, H.Y. Lin, T.Y. Hsu, C.Y. Lee, A 1.8 V 250 mW COFDM baseband receiver for DVB-T/H applications, in Proceedings of the IEEE International Solid-State Circuits Conference (2006), pp. 262–263

  9. K.H. Chen, Y.S. Chu, A spurious-power suppression technique for multimedia/DSP applications. IEEE Trans. Circuits Syst. I 56(1), 132–143 (2009)

    Article  MathSciNet  Google Scholar 

  10. K.H. Chen, Y.S. Li, A multi-radix FFT processor using pipeline in memory-based architecture (PIMA) for DVB-T/H systems, in Proceedings of the IEEE International Mixed Design of Integrated Circuits and Systems (2008), pp. 549–554.

  11. Y. Chen, Y.C. Tsao, Y.W. Lin, C.H. Lin, C.Y. Lee, An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications. IEEE Trans. Circuits Syst. II 55(2), 146–150 (2008)

    Article  Google Scholar 

  12. J.W. Cooley, J.W. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comput. 5(5), 87–109 (1965)

    MathSciNet  Google Scholar 

  13. ETSI, Digital video broadcasting (DVB); Framing structure, channel coding and modulation for digital terrestrial television, ETSI EN 300 744 v1.5.1 (2004)

  14. ETSI, Digital video broadcasting (DVB); transmission systems for handheld terminals (DVB-H), ETSI EN 302 304 v1.1.1 (2004)

  15. J.I. Guo, C.M. Liu, C.W. Jen, The efficient memory-based VLSI array designs for DFT and DCT. IEEE Trans. Circuits Syst. II 39(10), 723–733 (1992)

    Article  MATH  Google Scholar 

  16. S. He, M. Torkelson, Designing pipeline FFT processor for OFDM (de)modulation, in Proceedings of the IEEE International Symposium on Signals, Systems and Electronics (1998), pp. 257–262

  17. S.J. Huang, S.G. Chen, A high-throughput radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems. IEEE Trans. Circuits Syst. I 59(8), 1752–1765 (2012)

    Article  MathSciNet  Google Scholar 

  18. C. L. Hung, S. S. Long, and M. T. Shiue, A low-power and variable-length FFT design for flexible MIMO OFDM systems, Proceedings of the IEEE International Symposium on Circuits and Systems (2009), pp. 705–708

  19. L. Jia, Y. Gao, J. Isoaho, H. Tenhunen, A new VLSI-oriented FFT algorithm and implementation, in Proceedings of the IEEE ASIC Conference (1998), pp. 337–341

  20. M. Keating, P. Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs (Kluwer Academic Publishers, Dordrecht, 2002)

    Google Scholar 

  21. H.Y. Lee, Y.C. Park, Balanced binary-tree decomposition for area-efficient pipelined FFT processing. IEEE Trans. Circuits Syst. I 54(4), 889–900 (2007)

    Article  Google Scholar 

  22. H. Lee, M. Shin, A high-speed low-complexity two-parallel radix-2\(^{4}\) FFT/IFFT processor for UWB applications, in Proceedings of IEEE Asian Solid-State Circuits Conference (2007), pp. 284–287

  23. W. Li, L. Wanhammar, A pipeline FFT processor, in Proceedings of the IEEE Workshop on Signal Processing Systems (1999), pp. 654–662

  24. Y.W. Lin, H.Y. Liu, C.Y. Lee, A dynamic scaling FFT processor for DVB-T applications. IEEE J. Solid-State Circuits 39(11), 2005–2013 (2004)

    Article  Google Scholar 

  25. Y.W. Lin, H.Y. Liu, C.Y. Lee, A 1-GS/s FFT/IFFT processor for UWB applications. IEEE J. Solid-State Circuits 40(8), 1726–1735 (2005)

    Article  Google Scholar 

  26. S.Y. Lin, C.L. Wei, M.D. Shieh, Low-cost FFT processor for DVB-T2 applications. IEEE Trans. Consum. Electron. 56(4), 2072–2079 (2010)

    Article  Google Scholar 

  27. S. Magar, S. Shen, G. Luikuo, M. Fleming, R. Aguilar, An application specific DSP chip set for 100 MHz data rate. Proc. Int. Conf. Acoust. Speech Signal Process. 4, 1989–1992 (1988)

    Google Scholar 

  28. K. Maharatna, E. Grass, U. Jagdhold, A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM. IEEE J. Solid-State Circuits 39(3), 484–493 (2004)

    Article  Google Scholar 

  29. N. Miyamoto, L. Karnan, K. Maruo, K. Kotani, T. Ohmi, A small-area high-performance 512-point 2-dimensional FFT single-chip processor, in Proceedings of the IEEE European Solid-State Circuits Conference (2003), pp. 603–606

  30. K.K. Parhi, VLSI Digital Signal Processing Systems (Wiley-Interscience Publication, New York, 1999)

    Google Scholar 

  31. A.A. Petrovsky, S.L. Shkredov, Automatic generation of split-radix 2–4 parallel-pipeline FFT processors: hardware reconfiguration and core optimization, in Proceedings of the IEEE International Symposium on Parallel Computing Electrical Engineering (2006), pp. 181–186

  32. S. Qiao, Y. Hei, B. Wu, Y. Zhou, An area and power efficient FFT processor for UWB systems, in Proceedings of the IEEE Conference on Wireless Communications, Networking and Mobile Computing (2007), pp. 582–585

  33. Virtual silicon preliminary data sheet on single-port/dual-port/two-port SRAM compiler for UMC 0.18 \(\mu \)m (L180GII) (2004), pp. 1–3

  34. C. Wang, W.S. Gan, C.C. Jong, J. Luo, A low-cost 256-point FFT processor for portable speech and audio applications, in Proceedings of the IEEE International Symposium on Integrated Circuits (2007), pp. 81–84

  35. C.C. Wang, J.M. Huang, H.C. Cheng, A 2k/8k mode small-area FFT processor for OFDM demodulation of DVB-T receivers. IEEE Trans. Consum. Electron. 51(1), 28–32 (2005)

    Article  Google Scholar 

  36. C.L. Wey, W.C. Tang, S.Y. Lin, Efficient memory-based architectures for digital video broadcasting automation and test, in Proceedings of the IEEE International Symposium VLSI Design (2007), pp. 1–4

  37. W.C. Yeh, C.W. Jen, High-speed and low-power split-radix FFT. IEEE Trans. Signal Process. 51(3), 864–874 (2003)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author would like to thank National Chip Implementation Center (CIC) of Taiwan for the help on chip fabrication and measuring.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuan-Hung Chen.

Appendix: Fetching Data for Radix-2, Radix-4, and Radix-8 FFTs

Appendix: Fetching Data for Radix-2, Radix-4, and Radix-8 FFTs

Addressing modes are given for fetching data from the memory into the data-path for FFT operation based on radix-2, radix-4, and radix-8, respectively. The first one is for computing radix-2 FFT in which \(x(n)\) are fetched from the following address Addr[\(x(n)\)] inside the memory bank Bank[\(x(n)\)] of memory to the FFT data-path.

$$\begin{aligned}&{\text {Addr}}\left[ {x( n)} \right] =( {\# {\text {cycle}}}) \hbox {mod } ({N/16}) \end{aligned}$$
(22)
$$\begin{aligned}&{\text {Banks}}\left[ {x( n)} \right] =\,\,\,\left\lfloor {\frac{\# {\text {cycle}}}{N/16}} \right\rfloor \quad {\text {and}} \quad \left\lfloor {\frac{\# {\text {cycle}}}{N/16}} \right\rfloor \,+\,8 \end{aligned}$$
(23)

Besides, the partial timing diagram is shown as Fig. 11. From the figure, we can find that the numbers of a pair of banks are kept unchanged for 512 cycles, and the address number increments every clock cycle. After all the data of the two acting banks are fetched, the bank numbers increment to continue fetching data from the next pair of banks.

Fig. 11
figure 11

The timing diagram of data fetching for FFT operation based on radix-2 algorithm

The second one is for computing radix-4 FFT where \(x(n)\) are fetched from the following locations of memory to the FFT data-path for computing.

$$\begin{aligned} {\text {Addr}}\left[ {x( n)} \right]&=\,\,\,\left\lfloor {\frac{(\# {\text {cycle}}) \, \mathrm{{mod}} \, (N/16)}{2}} \right\rfloor \end{aligned}$$
(24)
$$\begin{aligned} {\text {Banks}}\left[ {x( n)} \right]&=\,\left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \cdot {\overline{(\# {\text {cycle}}) \, {\text {mod}} \, (2),}} \left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;8\right) \nonumber \\&\quad \cdot \,{\overline{(\# {\text {cycle}}){\text {mod}} \,(2),}}\left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;4\right) \cdot \left[ ( {\# {\text {cycle}}}){\text {mod}} \, (2)\right] ,\nonumber \\&\quad \left( \left\lfloor {\frac{\# {\text {cycle}}}{N/8}} \right\rfloor \,+\;12\right) \cdot \,[\,( {\# {\text {cycle}}}){\text {mod}} \, (2)] \end{aligned}$$
(25)

The partial timing diagram of data fetching for FFT operation based on radix-4 algorithm is shown as Fig. 12. From the figure, we can find that two pairs of banks form a basic unit and the data inside are fetched in order during 1,024 cycles. Thus, the address number increments every two clock cycles. After all the data of the acting unit are fetched, the bank numbers increment to continue fetching data from the next unit.

Fig. 12
figure 12

The timing diagram of data fetching for FFT operation based on radix-4 algorithm

Furthermore, for calculating radix-8 FFT, \(x(n)\) are fetched from the locations of memory as shown in Fig. 13 to the FFT data-path for computing. From the figure, we can find that four pairs of banks form a basic unit and the data inside are fetched in order during 2,048 cycles. Thus, the address number increments every four clock cycles. After all the data of the acting unit are fetched, the bank numbers increment to continue fetching data from the next unit.

Fig. 13
figure 13

The timing diagram of data fetching for FFT operation based on radix-8 algorithm

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, KH. A Low-Memory-Access Length-Adaptive Architecture for 2\(^n\)-Point FFT. Circuits Syst Signal Process 34, 459–482 (2015). https://doi.org/10.1007/s00034-014-9862-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-014-9862-x

Keywords

Navigation