An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors

Takahashi, Daisuke

doi:10.1007/978-3-319-62392-4_29

Daisuke Takahashi²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10404))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1942 Accesses
5 Citations

Abstract

In this paper, we propose an implementation of a parallel one-dimensional real fast Fourier transform (FFT) on Intel Xeon Phi processors. The proposed implementation of the parallel one-dimensional real FFT is based on the conjugate symmetry property for the discrete Fourier transform (DFT) and the six-step FFT algorithm. We vectorized FFT kernels using the Intel Advanced Vector Extensions 512 (AVX-512) instructions, and parallelized the six-step FFT by using OpenMP. Performance results of one-dimensional FFTs on Intel Xeon Phi processors are reported. We successfully achieved a performance of over 91 GFlops on an Intel Xeon Phi 7250 (1.4 GHz, 68 cores) for a \(2^{29}\)-point real FFT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://www.top500.org/
OpenMP Application Program Interface. http://www.openmp.org/mp-documents/spec30.pdf
Intel Math Kernel Library Developer Reference (2017). https://software.intel.com/sites/default/files/managed/ff/c8/mkl-2017-developer-reference-c_0.pdf
Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)
Article Google Scholar
Brigham, E.O.: The Fast Fourier Transform and Its Applications. Prentice-Hall, Upper Saddle River (1988)
Google Scholar
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Article MathSciNet MATH Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)
Article Google Scholar
Hascoet, J., Nezan, J.F., Ensor, A., de Dinechin, B.D.: Implementation of a fast Fourier transform algorithm onto a manycore processor. In: Proceedings of the 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP 2015) (2015)
Google Scholar
Intel Corporation: Intel architecture instruction set extensions programming reference (2016). https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf
Intel Corporation: Intel C++ compiler 17.0 developer guide and reference (2016). https://software.intel.com/en-us/intel-cplusplus-compiler-17.0-user-and-reference-guide-pdf
Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M.: Hyper-threading technology architecture and microarchitecture. Intel Technol. J. 6, 1–11 (2002)
Google Scholar
McFarlin, D.S., Arbatov, V., Franchetti, F., Püschel, M.: Automatic SIMD vectorization of fast Fourier transforms for the Larrabee and AVX instruction sets. In: Proceedings of the 25th International Conference on Supercomputing (ICS 2011), pp. 265–274 (2011)
Google Scholar
Püschel, M., Moura, J.M.F., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93, 232–275 (2005)
Article Google Scholar
Sodani, A., et al.: Knights Landing: second-generation Intel Xeon Phi product. IEEE Micro 36, 34–46 (2016)
Article Google Scholar
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)
Article MATH Google Scholar
Takahashi, D.: A blocking algorithm for FFT on cache-based processors. In: Hertzberger, B., Hoekstra, A., Williams, R. (eds.) HPCN-Europe 2001. LNCS, vol. 2110, pp. 551–554. Springer, Heidelberg (2001). doi:10.1007/3-540-48228-8_58
Chapter Google Scholar
Takahashi, D.: A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 665–668 (2003)
Google Scholar
Takahashi, D.: An Implementation of parallel 2-D FFT using Intel AVX instructions on multi-core processors. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds.) ICA3PP 2012. LNCS, vol. 7440, pp. 197–205. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33065-0_21
Chapter Google Scholar
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)
Book MATH Google Scholar

Download references

Acknowledgments

This research was partially supported by Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST).

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
Daisuke Takahashi

Authors

Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Trieste, Trieste, Italy
Giuseppe Borruso
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takahashi, D. (2017). An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-62392-4_29
Published: 06 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics