Abstract
In this paper, we propose an implementation of a parallel 3-D real fast Fourier transform (FFT) with 2-D decomposition on Intel Xeon Phi clusters. The proposed implementation of the parallel 3-D real FFT is based on the conjugate symmetry property of the discrete Fourier transform (DFT) and the row-column FFT algorithm. We vectorized FFT kernels using the Intel Advanced Vector Extensions 512 (Intel AVX-512) instructions. Performance results of parallel 3-D real FFTs on Intel Xeon Phi clusters are reported. We successfully achieved a level of performance over 10 TFlops on 2048 nodes of Fujitsu PRIMERGY CX1640 M1 cluster for an \(8192^3\)-point FFT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
2DECOMP&FFT - Library for 2D Pencil Decomposition and Distributed FFTs. http://www.2decomp.org/
TOP500 Supercomputer Sites. https://www.top500.org/
Ayala, O., Wang, L.P.: Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition. Parallel Comput. 39, 58–77 (2013)
Brass, A., Pawley, G.S.: Two and three dimensional FFTs on highly parallel computers. Parallel Comput. 3, 167–184 (1986)
Brigham, E.O.: The Fast Fourier Transform and Its Applications. Prentice-Hall, Upper Saddle River (1988)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)
Liu, Y.Q., Li, Y., Zhang, Y.Q., Zhang, X.Y.: Memory efficient two-pass 3D FFT algorithm for Intel® Xeon Phi™ coprocessor. J. Comput. Sci. Technol. 29, 989–1002 (2014)
Pekurovsky, D.: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions. SIAM J. Sci. Comput. 34, C192–C209 (2012)
Pippig, M.: PFFT: an extension of FFTW to massively parallel architectures. SIAM J. Sci. Comput. 35, C213–C236 (2013)
Takahashi, D.: An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors. In: Wyrzykowski, R., et al. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 606–614. Springer, Heidelberg (2010)
Takahashi, D.: An implementation of parallel 1-D real FFT on Intel Xeon Phi processors. In: Gervasi, O., et al. (eds.) ICCSA 2017, Part I. LNCS, vol. 10404, pp. 401–410. Springer, Cham (2017)
Acknowledgments
This research used computational resources of the Oakforest-PACS provided by the Multidisciplinary Cooperative Research Program in Center for Computational Sciences, University of Tsukuba. This research was partially supported by JSPS KAKENHI Grant Number JP19K11989.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Takahashi, D. (2020). Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Intel Xeon Phi Clusters. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)