Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

Wu, Dan; Zou, Xue-cheng; Dai, Kui; Rao, Jin-li; Chen, Pan; Zheng, Zhao-xia

doi:10.1631/jzus.C1100027

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

Published: 06 December 2011

Volume 12, pages 976–989, (2011)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Dan Wu¹,
Xue-cheng Zou¹,
Kui Dai¹,
Jin-li Rao¹,
Pan Chen¹ &
…
Zhao-xia Zheng¹

148 Accesses
3 Citations
Explore all metrics

Abstract

The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multicore architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Article 20 January 2021

Sergey I. Malkovsky, Aleksei A. Sorokin, … Vadim A. Kondrashev

Research on Parallel Architecture of OpenCL-Based FPGA

heFFTe: Highly Efficient FFT for Exascale

References

Agarwal, R.C., Gustavson, F.G., Zubair, M., 1994. A High Performance Parallel Algorithm for 1-D FFT. Proc. Supercomputing, p.34–40. [doi:10.1109/SUPERC.1994.344263]
Bahn, J.H., Yang, J., Bagherzadeh, N., 2008. Parallel FFT Algorithms on Network-on-Chips. 5th Int. Conf. on Information Technology: New Generation, p.1087–1093. [doi:10.1109/ITNG.2008.55]
Barker, K.J., Davis, K., Hoisie, A., Kerbyson, D.J., Lang, M., Pakin, S., Sancho, J.C., 2008. Entering the Petaflop Era: the Architecture and Performance of Roadrunner. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, p.1–12. [doi:10.1109/SC.2008.5217926]
Barua, S., Thulasiram, R.K., Thulasiraman, P., 2004. Improving Data Locality in Parallel Fast Fourier Transform Algorithm for Pricing Financial Derivatives. Proc. 18th Int. Parallel and Distributed Processing Symp., p.235–240. [doi:10.1109/IPDPS.2004.1303283]
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J., 2006. CellSs: a Programming Model for the Cell BE Architecture. Proc. ACM/IEEE SC Conf., p.5–15. [doi:10.1109/SC.2006.17]
benchFFT, 2003. FFT Benchmark Methodology. Available from http://www.fftw.org/speed/method.html [Accessed on Jan. 16, 2011].
Cooley, J.W., Tukey, J.W., 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput., 19(90):297–301. [doi:10.1090/S0025-5718-1965-0178586-1]
Article MATH MathSciNet Google Scholar
Cvetanovic, Z., 1987. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM J. Res. Dev., 31(4):435–451.[doi:10.1147/rd.314.0435]
Article Google Scholar
Deng, Y.D., Maly, W.P., 2010. 3-Dimensional VLSI: a 2.5-Dimensional Integration Scheme. Tsinghua University Publishing House, Beijing, China, p.144–158. [doi:10.1007/978-3-642-04157-0_7]
MATH Google Scholar
Frigo, M., Johnson, S.G., 2005. The design and implementation of FFTW3. Proc. IEEE, 93(2):216–231. [doi:10.1109/JPROC.2004.840301]
Article Google Scholar
Frigo, M., Johnson, S.G., 2007. FFTW on the Cell Processor. Available from 〈http://www.fftw.org/cell/index.html〉 [Accessed on Jan. 16, 2011].
IBM, 2005. The Cell Architecture. Available from http://www.research.ibm.com/cell/home.html [Accessed on Jan. 16, 2011].
Joint Cell Competence Center, 2009. FFT Performance Results of IBM QS22 Cell Blade. Available from http://cell.icm.edu.pl/index.php/FFTW_on_Cell [Accessed on May 10, 2011].
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D., 2005. Introduction to the Cell multiprocessor. IBM J. Res. Dev., 49(4–5):589–604. [doi:10.1147/rd.494.0589]
Article Google Scholar
Kistler, M., Gunnels, J., Brokenshire, D., Benton, B., 2009. Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci. Progr., 17(1–2):43–57. [doi:10.3233/SPR-2009-0278]
Google Scholar
Nishikawa, Y., Koibuchi, M., Yoshimi, M., Miura, K., Amano, H., 2007. Performance Improvement Methodology for ClearSpeed’s CSX600. Int. Conf. on Parallel Processing, p.77. [doi:10.1109/ICPP.2007.66]
NVIDIA, 2009. High Performance Computing — Supercomputing with Tesla GPUs. Available from http://www.nvidia.com/object/tesla_computing_solutions.html [Accessed on May 10, 2011].
NVIDIA, 2010. Tesla C2050 Performance Benchmarks. Available from http://nvworld.ru/files/articles/calculationson-gpu-advantages-fermi/fermipeformance.pdf [Accessed on May 10, 2011].
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Philips, J.C., 2008. GPU computing. Proc. IEEE, 96(5):879–899. [doi:10.1109/JPROC.2008.917757]
Article Google Scholar
Swarztrauber, P.N., 1984. FFT algorithms for vector computers. Parall. Comput., 1(1):45–63. [doi:10.1016/S0167-8191(84)90413-7]
Article MATH MathSciNet Google Scholar
Takahashi, D., 2000. High-Performance Parallel FFT Algorithms for the HITACHI SR8000. Proc. 4th Int. Conf./Exhibition on High Performance Computing in the Asia-Pacific Region, p.192–199. [doi:10.1109/HPC.2000.846545]
Takahashi, D., 2002. A blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers. 6th Int. Conf. of Applied Parallel Computing, Advanced Scientific Computing, p.380–389.
Taylor, M.B., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A., Lee, W., Miller, J., et al., 2004. Evaluation of the Raw Microprocessor: an Exposed-Wire-Delay Architecture for ILP and Streams. Proc. 31st Annual Int. Symp. on Computer Architecture, p.2–13. [doi:10.1109/ISCA.2004.1310759]
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K., 2006. The Potential of the Cell Processor for Scientific Computing. Proc. 3rd Conf. on Computing Frontiers, p.9–20. [doi:10.1145/1128022.1128027]
Wu, D., Dai, K., Zou, X.C., Rao, J.L., Chen, P., 2010. A High Efficient on-Chip Interconnect Network in SIMD CMPs. 10th Int. Conf. on Algorithms and Architecture for Parallel Processing, p.149–162. [doi:10.1007/978-3-642-13119-6_13]

Download references

Author information

Authors and Affiliations

Department of Electronic Science & Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Dan Wu, Xue-cheng Zou, Kui Dai, Jin-li Rao, Pan Chen & Zhao-xia Zheng

Authors

Dan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xue-cheng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Kui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jin-li Rao
View author publications
You can also search for this author in PubMed Google Scholar
Pan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhao-xia Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kui Dai.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 60973035 and 60976027) and the Natural Science Foundation of Hubei Province, China (No. 2010CDB02705)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, D., Zou, Xc., Dai, K. et al. Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture. J. Zhejiang Univ. - Sci. C 12, 976–989 (2011). https://doi.org/10.1631/jzus.C1100027

Download citation

Received: 26 January 2011
Accepted: 26 July 2011
Published: 06 December 2011
Issue Date: December 2011
DOI: https://doi.org/10.1631/jzus.C1100027

Key words

CLC number

TP302.7

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

Abstract

Access this article

Similar content being viewed by others

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Research on Parallel Architecture of OpenCL-Based FPGA

heFFTe: Highly Efficient FFT for Exascale

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

Abstract

Access this article

Similar content being viewed by others

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Research on Parallel Architecture of OpenCL-Based FPGA

heFFTe: Highly Efficient FFT for Exascale

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation