Skip to main content

Advertisement

Log in

Design Space Exploration of 1-D FFT Processor

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Guideline for 3.5GHz 5G System Prototype and Trial(Version 1.0). Tech. rep. CMCC (2017).

  2. 3GPP TS 36.211: Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Channels and Modulation (2012).

  3. Antelo, E., Villalba, J., Bruguera, J.D., Zapata, E.L. (1997). High performance rotation architectures based on the radix-4 CORDIC algorithm. IEEE Transactions on Computers, 46(8), 855–870.

    Article  Google Scholar 

  4. Ayinala, M., Brown, M., Parhi, K.K. (2012). Pipelined parallel FFT architectures via folding transformation. IEEE Transactions on VLSI Systems, 20(6), 1068–1081.

    Article  Google Scholar 

  5. Baas, B.M. (1999). A low-power, high-performance, 1024-point FFT processor. IEEE Journal of Solid-State Circuits, 34(3), 380–387.

    Article  Google Scholar 

  6. Bidet, E., Castelain, D., Joanblanq, C., Senn, P. (1995). A Fast single-chip implementation of 8192 complex point FFT. IEEE Journal of Solid-State Circuits, 30(3), 300–305.

    Article  Google Scholar 

  7. Chen, J., Hu, J., Lee, S., Sobelman, G.E. (2015). Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems. IEEE Transaction on VLSI Systems, 23(2), 221–229.

    Article  Google Scholar 

  8. Chen, S.G., Huang, S.J., Garrido, M., Jou, S.J. (2014). Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(10), 2869–2877.

    Article  Google Scholar 

  9. Chen, Y., Lin, Y.W., Tsao, Y.C., Lee, C.Y. (2008). A 2.4-gsample/s DVFS FFT processor for MIMO OFDM communication systems. IEEE Journal of Solid-State Circuits, 43(5), 1260–1273.

    Article  Google Scholar 

  10. Chen, Y., Tsao, Y.C., Lin, Y.W., Lin, C.H., Lee, C.Y. (2008). An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 55 (2), 146–150.

    Article  Google Scholar 

  11. Cohen, D. (1976). Simplified control of FFT hardware. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(6), 577–579.

    Article  Google Scholar 

  12. Cooley, J.W., & Tukey, J.W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19(90), 297–301.

    Article  MathSciNet  Google Scholar 

  13. Despain, A.M. (1974). Fourier transform computers using CORDIC iterations. IEEE Transactions on Communications, C-23(10), 993–1001.

    MATH  Google Scholar 

  14. Duhamel, P., & Hollmann, H. (1984). ’Split radix’ FFT algorithm. Electronics Letters, 20(1), 14–16.

    Article  Google Scholar 

  15. Frigo, M., & Johnson, S.G. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93 (2), 216–231.

    Article  Google Scholar 

  16. Fu, B., & Ampadu, P. (2009). An area efficient FFT/IFFT processor for MIMO-OFDM WLAN 802.11n. Journal of Signal Processing Systems, 56(1), 59–68.

    Article  Google Scholar 

  17. Garrido, M., & Grajal, J. (2007). Efficient memoryless CORDIC for FFT computation. In Proc. IEEE Int. Conf. acoustics, speech, and signal proceess. (ICASSP) (Vol. 2, pp. II–113–II–116).

  18. Garrido, M., Huang, S.J., Chen, S.G. (2018). Feedforward FFT hardware architectures based on rotator allocation. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(2), 581–592.

    Article  Google Scholar 

  19. Garrido, M., Huang, S.J., Chen, S.G., Gustafsson, O. (2016). The serial commutator (SC) FFT. IEEE Transactions on Circuits and Systems II: Express Briefs, 63(10), 974–978.

    Article  Google Scholar 

  20. Garrido, M., Sanchez, M.A., Lopez-Vallejo, M.L., Grajal, J. (2017). A 4096-Point Radix-4 memory-based FFT using DSP slices. IEEE Transactions of VLSI Systems, 25(1), 375–379.

    Article  Google Scholar 

  21. Guan, X., Fei, Y., Lin, H. (2012). Hierarchical design of an application-specific instruction set processor for high-throughput and scalable FFT processing. IEEE Transactions on VLSI Systems, 20(3), 551–563.

    Article  Google Scholar 

  22. Hasan, M., & Arslan, T. (2002). Scheme for reducing size of coefficient memory in FFT processor. Electronics Letters, 38(4), 163–164.

    Article  Google Scholar 

  23. Hsiao, C.F., Chen, Y., Lee, C.Y. (2010). A generalized mixed-radix algorithm for memory-based FFT processors. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(1), 26–30.

    Article  Google Scholar 

  24. Huang, S.J., & Chen, S.G. (2012). A high throughput Radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(8), 1752–1765.

    Article  MathSciNet  Google Scholar 

  25. Huang, S.J., & Chen, S.G. (2014). A new memoryless and low-latency FFT rotator architecture. In Int. Symp. on integrated circuits (ISIC) (pp. 180–183).

  26. Humphries, B., Zhang, H., Sheng, J., Landaverde, R., Herbordt, M.C. (2014). 3D FFTs on a single FPGA. In IEEE 22nd Annual Int. symp. on field-programmable custom computing machines (pp. 68–71).

  27. Good, I.J. (1958). The interaction algorithm and practical fourier analysis. Journal of the Royal Statistical Society. Series B, 20(2), 361–372.

    MathSciNet  MATH  Google Scholar 

  28. IEEE 802.11ac-2013: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications–Amendment 4: Enhancements for Very High Throughput for Operation in Bands below 6 GHz.

  29. Ingemarsson, C., Kallstrom, P., Qureshi, F., Gustafsson, O. (2017). Efficient FPGA mapping of pipeline SDF FFT cores. IEEE Transactions of VLSI Systems, 25(9), 2486–2497.

    Article  Google Scholar 

  30. Jaime, F.J., Sanchez, A.M., Hormigo, J., Villalba, J., Zapata, E.L. (2010). Enhanced scaling-free CORDIC. IEEE Transactions on Circuits and Systems I: Regular Papers, 57(7), 1654– 1662.

    Article  MathSciNet  Google Scholar 

  31. Jeon, D., Seok, M., Chakrabarti, C. (2012). A super pipelined energy efficient subthreshold 240 MS/s FFT core in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 47(1), 23–34.

    Article  Google Scholar 

  32. Jui, P.C., Wey, C.L., Shiue, M.T. (2013). Low-cost parallel FFT processors with conflict-free ROM-based twiddle factor generator for DVB-T2 applications. In IEEE Int. Midwest symp. circuits syst. (MWSCAS) (pp. 1003–1006).

  33. Choi, J.-R., Park, S.-B., Han, D.-S., Park, S.-H. (2000). A 2048 complex point FFT architecture for digital audio broadcasting system. In Proc. IEEE Int. symp. circuits syst. emerging technol. for the 21st Century (Vol. 5, pp. 693–696).

  34. Kang, H.J., Yang, B.D., Lee, J.Y. (2013). Low complexity twiddle factor multiplication with ROM partitioning in FFT processor. Electronics Letters, 49(9), 589–591.

    Article  Google Scholar 

  35. Kim, D., & Choi, H.W. (2008). Advanced constant multiplier for multipath pipelined FFT processor. Electronics Letters, 44(8), 518–519.

    Article  Google Scholar 

  36. Koutsoyannis, R., Milder, P.A., Berger, C.R., Glick, M., Hoe, J.C., Puschel, M. (2012). Improving fixed-point accuracy of FFT cores in O-OFDM systems. In Proc. IEEE Int. conf. acoustics, speech, and signal proceess. (ICASSP) (pp. 1585–1588).

  37. Kuhlmann, M., & Parhi, K.K. (2002). P-CORDIC: a precomputation based rotation CORDIC algorithm. EURASIP Journal of Applied Signal Processing, 2002(9), 936–943.

    MATH  Google Scholar 

  38. Lakshmi, B., & Dhar, A.S. (2008). High speed architectural implementation of CORDIC algorithm. In TENCON 2008 - 2008 IEEE region 10 conf. (pp. 1–5).

  39. Lee, Y.H., Yu, T.H., Huang, K.K., Wu, A.Y. (2006). Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems. In Proc. IEEE Workshop signal process. syst. design and implement. (pp. 62–65).

  40. Lenart, T., & Owall, V. (2003). A 2048 complex point FFT processor using a novel data scaling approach. In Proc. IEEE Int. Symp. circuits syst. (ISCAS) (Vol. 4, pp. IV–45–IV–48).

  41. Lenart, T., & Owall, V. (2006). Architectures for dynamic data scaling in 2/4/8K pipeline FFT cores. IEEE Transactions on VLSI Systems, 14(11), 1286–1290.

    Article  Google Scholar 

  42. Lin, C.H., & Wu, A.Y. (2005). Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 52(11), 2385–2396.

    Article  Google Scholar 

  43. Lin, Y.W., Liu, H.Y., Lee, C.Y. (2004). A dynamic scaling FFT processor for DVB-T applications. IEEE Journal of Solid-State Circuits, 39(11), 2005–2013.

    Article  Google Scholar 

  44. Lin, Y.W., Liu, H.Y., Lee, C.Y. (2005). A 1-GS/s FFT/IFFT processor for UWB applications. IEEE Journal of Solid-State Circuits, 40(8), 1726–1735.

    Article  Google Scholar 

  45. Luo, H.F., Liu, Y.J., Shieh, M.D. (2015). Efficient memory-addressing algorithms for FFT processor design. IEEE Transactions of VLSI Systems, 23(10), 2162–2172.

    Article  Google Scholar 

  46. Maharatna, K., Banerjee, S., Grass, E., Krstic, M., Troya, A. (2005). Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture. IEEE Transactions on Circuits and Systems for Video Technology, 15(11), 1463–1474.

    Article  Google Scholar 

  47. Oh, J.Y., & Lim, M.S. (2005). Area and power efficient pipeline FFT algorithm. In Proc. IEEE Workshop signal process. syst. design and implement (pp. 520–525).

  48. Park, S.Y., & Yu, Y.J. (2012). Fixed-point analysis and parameter selections of MSR-CORDIC with applications to FFT designs. IEEE Transactions on Signal Processing, 60(12), 6245–6256.

    Article  MathSciNet  Google Scholar 

  49. Qian, Z., & Margala, M. (2016). Low-power split-radix FFT processors using radix-2 butterfly units. IEEE Transactions on VLSI Systems, 24(9), 3008–3012.

    Article  Google Scholar 

  50. Qureshi, F., Garrido, M., Gustafsson, O. (2013). Unified architecture for 2,3,4,5,and 7-point DFTs based on Winograd fourier transform algorithm. Electronics Letters, 49(5), 348–349.

    Article  Google Scholar 

  51. Rader, C.M. (1968). Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE, 56(6), 1107–1108.

    Article  Google Scholar 

  52. Richardson, S., Markoviċ, D., Danowitz, A., Brunhaver, J., Horowitz, M. (2015). Building conflict-free FFT schedules. IEEE Transactions on Circuits and Systems I: Regular Papers, 62(4), 1146–1155.

    Article  MathSciNet  Google Scholar 

  53. Shih, X.Y., Chou, H.R., Liu, Y.Q. (2018). VLSI design and implementation of reconfigurable 46-mode combined-radix-based FFT hardware architecture for 3GPP-LTE applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1), 118–129.

    Article  Google Scholar 

  54. Shih, X.Y., Liu, Y.Q., Chou, H.R. (2017). 48-mode reconfigurable design of SDF FFT hardware architecture using Radix-3ˆ,2 and Radix-2ˆ3 design approaches. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(6), 1456–1467.

    Article  Google Scholar 

  55. Shousheng, H., & Torkelson, M. (1998). Designing pipeline FFT processor for OFDM (de)modulation. In Proc. URSI int. symp. signals, syst. and elect. (pp. 257–262).

  56. Tang, S.N., Jan, F.C., Cheng, H.W., Lin, C.K., Wu, G.Z. (2014). Multimode memory-based FFT processor for wireless display FD-OCT medical systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(12), 3394–3406.

    Article  Google Scholar 

  57. Tang, S.N., Liao, C.H., Chang, T.Y. (2012). An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systens. IEEE Journal of Solid-State Circuits, 47(6), 1419–1435.

    Article  Google Scholar 

  58. Tang, S.N., Tsai, J.W., Chang, T.Y. (2010). A 2.4-GS/s FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(6), 451–455.

    Article  Google Scholar 

  59. Thomas, L.H. (1963). Using a computer to solve problems in physics. Applications of digital computers. Boston: Ginn.

    Google Scholar 

  60. Wang, J., Xiong, C., Zhang, K., Wei, J. (2016). A mixed-decimation MDF architecture for Radix-2k parallel FFT. IEEE Transactions on VLSI Systems, 24(1), 67–78.

    Article  Google Scholar 

  61. Wang, Z., Liu, X., He, B., Yu, F. (2015). A combined SDC-SDF architecture for normal I/O pipelined Radix-2 FFT. IEEE Transactions on VLSI Systems, 23(5), 973–977.

    Article  Google Scholar 

  62. Xia, K.F., Wu, B., Xiong, T., Ye, T.C. (2017). A memory-based FFT processor design with generalized efficient conflict-free address schemes. IEEE Transactions on VLSI Systems, 25(6), 1919–1929.

    Article  Google Scholar 

  63. Yang, C.H., Yu, T.H., Markoviċ, D. (2012). Power and area minimization of reconfigurable FFT processors: a 3GPP-LTE example. IEEE Journal of Solid-State Circuits, 47(3), 757– 768.

    Article  Google Scholar 

  64. Yang, K.J., Tsai, S.H., Chuang, G.C.H. (2013). MDC FFT/IFFT processor with variable length for MIMO-OFDM systems. IEEE Transactions on VLSI Systems, 21(4), 720–731.

    Article  Google Scholar 

  65. Yang, S.W., & Lee, J.Y. (2014). Constant twiddle factor multiplier sharing in multipath delay feedback parallel pipelined FFT processors. Electronics Letters, 50(15), 1050–1052.

    Article  Google Scholar 

  66. Yu, C., & Yen, M.h. (2015). Area-efficient 128- to 2048/1536-point pipeline FFT processor for LTE and mobile WiMAX systems. IEEE Transactions on VLSI Systems, 23(9), 1793–1800.

    Article  MathSciNet  Google Scholar 

  67. Yu, C.L., Irick, K., Chakrabarti, C., Narayanan, V. (2011). Multidimensional DFT IP generator for FPGA platforms. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(4), 755–764.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Synopsys for their support in the use of ASIP Designer, which is used as a high-level synthesizer.

The finance supporting from National High Technical Research and Development Program of China (863 program) 2014AA01A705 is sincerely acknowledged by authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dake Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Liu, D. Design Space Exploration of 1-D FFT Processor. J Sign Process Syst 90, 1609–1621 (2018). https://doi.org/10.1007/s11265-018-1393-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1393-4

Keywords

Navigation