Abstract
A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs.
Similar content being viewed by others
References
Guideline for 3.5GHz 5G System Prototype and Trial(Version 1.0). Tech. rep. CMCC (2017).
3GPP TS 36.211: Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Channels and Modulation (2012).
Antelo, E., Villalba, J., Bruguera, J.D., Zapata, E.L. (1997). High performance rotation architectures based on the radix-4 CORDIC algorithm. IEEE Transactions on Computers, 46(8), 855–870.
Ayinala, M., Brown, M., Parhi, K.K. (2012). Pipelined parallel FFT architectures via folding transformation. IEEE Transactions on VLSI Systems, 20(6), 1068–1081.
Baas, B.M. (1999). A low-power, high-performance, 1024-point FFT processor. IEEE Journal of Solid-State Circuits, 34(3), 380–387.
Bidet, E., Castelain, D., Joanblanq, C., Senn, P. (1995). A Fast single-chip implementation of 8192 complex point FFT. IEEE Journal of Solid-State Circuits, 30(3), 300–305.
Chen, J., Hu, J., Lee, S., Sobelman, G.E. (2015). Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems. IEEE Transaction on VLSI Systems, 23(2), 221–229.
Chen, S.G., Huang, S.J., Garrido, M., Jou, S.J. (2014). Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(10), 2869–2877.
Chen, Y., Lin, Y.W., Tsao, Y.C., Lee, C.Y. (2008). A 2.4-gsample/s DVFS FFT processor for MIMO OFDM communication systems. IEEE Journal of Solid-State Circuits, 43(5), 1260–1273.
Chen, Y., Tsao, Y.C., Lin, Y.W., Lin, C.H., Lee, C.Y. (2008). An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 55 (2), 146–150.
Cohen, D. (1976). Simplified control of FFT hardware. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(6), 577–579.
Cooley, J.W., & Tukey, J.W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19(90), 297–301.
Despain, A.M. (1974). Fourier transform computers using CORDIC iterations. IEEE Transactions on Communications, C-23(10), 993–1001.
Duhamel, P., & Hollmann, H. (1984). ’Split radix’ FFT algorithm. Electronics Letters, 20(1), 14–16.
Frigo, M., & Johnson, S.G. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93 (2), 216–231.
Fu, B., & Ampadu, P. (2009). An area efficient FFT/IFFT processor for MIMO-OFDM WLAN 802.11n. Journal of Signal Processing Systems, 56(1), 59–68.
Garrido, M., & Grajal, J. (2007). Efficient memoryless CORDIC for FFT computation. In Proc. IEEE Int. Conf. acoustics, speech, and signal proceess. (ICASSP) (Vol. 2, pp. II–113–II–116).
Garrido, M., Huang, S.J., Chen, S.G. (2018). Feedforward FFT hardware architectures based on rotator allocation. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(2), 581–592.
Garrido, M., Huang, S.J., Chen, S.G., Gustafsson, O. (2016). The serial commutator (SC) FFT. IEEE Transactions on Circuits and Systems II: Express Briefs, 63(10), 974–978.
Garrido, M., Sanchez, M.A., Lopez-Vallejo, M.L., Grajal, J. (2017). A 4096-Point Radix-4 memory-based FFT using DSP slices. IEEE Transactions of VLSI Systems, 25(1), 375–379.
Guan, X., Fei, Y., Lin, H. (2012). Hierarchical design of an application-specific instruction set processor for high-throughput and scalable FFT processing. IEEE Transactions on VLSI Systems, 20(3), 551–563.
Hasan, M., & Arslan, T. (2002). Scheme for reducing size of coefficient memory in FFT processor. Electronics Letters, 38(4), 163–164.
Hsiao, C.F., Chen, Y., Lee, C.Y. (2010). A generalized mixed-radix algorithm for memory-based FFT processors. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(1), 26–30.
Huang, S.J., & Chen, S.G. (2012). A high throughput Radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(8), 1752–1765.
Huang, S.J., & Chen, S.G. (2014). A new memoryless and low-latency FFT rotator architecture. In Int. Symp. on integrated circuits (ISIC) (pp. 180–183).
Humphries, B., Zhang, H., Sheng, J., Landaverde, R., Herbordt, M.C. (2014). 3D FFTs on a single FPGA. In IEEE 22nd Annual Int. symp. on field-programmable custom computing machines (pp. 68–71).
Good, I.J. (1958). The interaction algorithm and practical fourier analysis. Journal of the Royal Statistical Society. Series B, 20(2), 361–372.
IEEE 802.11ac-2013: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications–Amendment 4: Enhancements for Very High Throughput for Operation in Bands below 6 GHz.
Ingemarsson, C., Kallstrom, P., Qureshi, F., Gustafsson, O. (2017). Efficient FPGA mapping of pipeline SDF FFT cores. IEEE Transactions of VLSI Systems, 25(9), 2486–2497.
Jaime, F.J., Sanchez, A.M., Hormigo, J., Villalba, J., Zapata, E.L. (2010). Enhanced scaling-free CORDIC. IEEE Transactions on Circuits and Systems I: Regular Papers, 57(7), 1654– 1662.
Jeon, D., Seok, M., Chakrabarti, C. (2012). A super pipelined energy efficient subthreshold 240 MS/s FFT core in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 47(1), 23–34.
Jui, P.C., Wey, C.L., Shiue, M.T. (2013). Low-cost parallel FFT processors with conflict-free ROM-based twiddle factor generator for DVB-T2 applications. In IEEE Int. Midwest symp. circuits syst. (MWSCAS) (pp. 1003–1006).
Choi, J.-R., Park, S.-B., Han, D.-S., Park, S.-H. (2000). A 2048 complex point FFT architecture for digital audio broadcasting system. In Proc. IEEE Int. symp. circuits syst. emerging technol. for the 21st Century (Vol. 5, pp. 693–696).
Kang, H.J., Yang, B.D., Lee, J.Y. (2013). Low complexity twiddle factor multiplication with ROM partitioning in FFT processor. Electronics Letters, 49(9), 589–591.
Kim, D., & Choi, H.W. (2008). Advanced constant multiplier for multipath pipelined FFT processor. Electronics Letters, 44(8), 518–519.
Koutsoyannis, R., Milder, P.A., Berger, C.R., Glick, M., Hoe, J.C., Puschel, M. (2012). Improving fixed-point accuracy of FFT cores in O-OFDM systems. In Proc. IEEE Int. conf. acoustics, speech, and signal proceess. (ICASSP) (pp. 1585–1588).
Kuhlmann, M., & Parhi, K.K. (2002). P-CORDIC: a precomputation based rotation CORDIC algorithm. EURASIP Journal of Applied Signal Processing, 2002(9), 936–943.
Lakshmi, B., & Dhar, A.S. (2008). High speed architectural implementation of CORDIC algorithm. In TENCON 2008 - 2008 IEEE region 10 conf. (pp. 1–5).
Lee, Y.H., Yu, T.H., Huang, K.K., Wu, A.Y. (2006). Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems. In Proc. IEEE Workshop signal process. syst. design and implement. (pp. 62–65).
Lenart, T., & Owall, V. (2003). A 2048 complex point FFT processor using a novel data scaling approach. In Proc. IEEE Int. Symp. circuits syst. (ISCAS) (Vol. 4, pp. IV–45–IV–48).
Lenart, T., & Owall, V. (2006). Architectures for dynamic data scaling in 2/4/8K pipeline FFT cores. IEEE Transactions on VLSI Systems, 14(11), 1286–1290.
Lin, C.H., & Wu, A.Y. (2005). Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 52(11), 2385–2396.
Lin, Y.W., Liu, H.Y., Lee, C.Y. (2004). A dynamic scaling FFT processor for DVB-T applications. IEEE Journal of Solid-State Circuits, 39(11), 2005–2013.
Lin, Y.W., Liu, H.Y., Lee, C.Y. (2005). A 1-GS/s FFT/IFFT processor for UWB applications. IEEE Journal of Solid-State Circuits, 40(8), 1726–1735.
Luo, H.F., Liu, Y.J., Shieh, M.D. (2015). Efficient memory-addressing algorithms for FFT processor design. IEEE Transactions of VLSI Systems, 23(10), 2162–2172.
Maharatna, K., Banerjee, S., Grass, E., Krstic, M., Troya, A. (2005). Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture. IEEE Transactions on Circuits and Systems for Video Technology, 15(11), 1463–1474.
Oh, J.Y., & Lim, M.S. (2005). Area and power efficient pipeline FFT algorithm. In Proc. IEEE Workshop signal process. syst. design and implement (pp. 520–525).
Park, S.Y., & Yu, Y.J. (2012). Fixed-point analysis and parameter selections of MSR-CORDIC with applications to FFT designs. IEEE Transactions on Signal Processing, 60(12), 6245–6256.
Qian, Z., & Margala, M. (2016). Low-power split-radix FFT processors using radix-2 butterfly units. IEEE Transactions on VLSI Systems, 24(9), 3008–3012.
Qureshi, F., Garrido, M., Gustafsson, O. (2013). Unified architecture for 2,3,4,5,and 7-point DFTs based on Winograd fourier transform algorithm. Electronics Letters, 49(5), 348–349.
Rader, C.M. (1968). Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE, 56(6), 1107–1108.
Richardson, S., Markoviċ, D., Danowitz, A., Brunhaver, J., Horowitz, M. (2015). Building conflict-free FFT schedules. IEEE Transactions on Circuits and Systems I: Regular Papers, 62(4), 1146–1155.
Shih, X.Y., Chou, H.R., Liu, Y.Q. (2018). VLSI design and implementation of reconfigurable 46-mode combined-radix-based FFT hardware architecture for 3GPP-LTE applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1), 118–129.
Shih, X.Y., Liu, Y.Q., Chou, H.R. (2017). 48-mode reconfigurable design of SDF FFT hardware architecture using Radix-3ˆ,2 and Radix-2ˆ3 design approaches. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(6), 1456–1467.
Shousheng, H., & Torkelson, M. (1998). Designing pipeline FFT processor for OFDM (de)modulation. In Proc. URSI int. symp. signals, syst. and elect. (pp. 257–262).
Tang, S.N., Jan, F.C., Cheng, H.W., Lin, C.K., Wu, G.Z. (2014). Multimode memory-based FFT processor for wireless display FD-OCT medical systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(12), 3394–3406.
Tang, S.N., Liao, C.H., Chang, T.Y. (2012). An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systens. IEEE Journal of Solid-State Circuits, 47(6), 1419–1435.
Tang, S.N., Tsai, J.W., Chang, T.Y. (2010). A 2.4-GS/s FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(6), 451–455.
Thomas, L.H. (1963). Using a computer to solve problems in physics. Applications of digital computers. Boston: Ginn.
Wang, J., Xiong, C., Zhang, K., Wei, J. (2016). A mixed-decimation MDF architecture for Radix-2k parallel FFT. IEEE Transactions on VLSI Systems, 24(1), 67–78.
Wang, Z., Liu, X., He, B., Yu, F. (2015). A combined SDC-SDF architecture for normal I/O pipelined Radix-2 FFT. IEEE Transactions on VLSI Systems, 23(5), 973–977.
Xia, K.F., Wu, B., Xiong, T., Ye, T.C. (2017). A memory-based FFT processor design with generalized efficient conflict-free address schemes. IEEE Transactions on VLSI Systems, 25(6), 1919–1929.
Yang, C.H., Yu, T.H., Markoviċ, D. (2012). Power and area minimization of reconfigurable FFT processors: a 3GPP-LTE example. IEEE Journal of Solid-State Circuits, 47(3), 757– 768.
Yang, K.J., Tsai, S.H., Chuang, G.C.H. (2013). MDC FFT/IFFT processor with variable length for MIMO-OFDM systems. IEEE Transactions on VLSI Systems, 21(4), 720–731.
Yang, S.W., & Lee, J.Y. (2014). Constant twiddle factor multiplier sharing in multipath delay feedback parallel pipelined FFT processors. Electronics Letters, 50(15), 1050–1052.
Yu, C., & Yen, M.h. (2015). Area-efficient 128- to 2048/1536-point pipeline FFT processor for LTE and mobile WiMAX systems. IEEE Transactions on VLSI Systems, 23(9), 1793–1800.
Yu, C.L., Irick, K., Chakrabarti, C., Narayanan, V. (2011). Multidimensional DFT IP generator for FPGA platforms. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(4), 755–764.
Acknowledgments
The authors would like to thank Synopsys for their support in the use of ASIP Designer, which is used as a high-level synthesizer.
The finance supporting from National High Technical Research and Development Program of China (863 program) 2014AA01A705 is sincerely acknowledged by authors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, S., Liu, D. Design Space Exploration of 1-D FFT Processor. J Sign Process Syst 90, 1609–1621 (2018). https://doi.org/10.1007/s11265-018-1393-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-018-1393-4