Skip to main content

An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs

  • Conference paper
Applied Parallel Computing. State of the Art in Scientific Computing (PARA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

Abstract

In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) using short vector SIMD instructions on clusters of PCs. We vectorized FFT kernels using Intel’s Streaming SIMD Extensions 2 (SSE2) instructions. We show that a combination of the vectorization and block three-dimensional FFT algorithm improves performance effectively. Performance results of three-dimensional FFTs on a dual Xeon 2.8 GHz PC SMP cluster are reported. We successfully achieved performance of over 5 GFLOPS on a 16-node dual Xeon 2.8 GHz PC SMP cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooley, J.W., Tukey, J.W.: An algorithmfor themachine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  2. Brass, A., Pawley, G.S.: Two and three dimensional FFTs on highly parallel computers. Parallel Computing 3, 167–184 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  3. Agarwal, R.C., Gustavson, F.G., Zubair, M.: An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 129–133 (1994)

    Google Scholar 

  4. Hegland, M.: Real and complex fast Fourier transforms on the Fujitsu VPP 500. Parallel Computing 22, 539–553 (1996)

    Article  MATH  Google Scholar 

  5. Calvin, C.: Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication. Parallel Computing 22, 1255–1279 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  6. Takahashi, D.: Efficient implementation of parallel three-dimensional FFT on clusters of PCs. Computer Physics Communications 152, 144–150 (2003)

    Article  Google Scholar 

  7. Nadehara, K., Miyazaki, T., Kuroda, I.: Radix-4 FFT implementation using SIMDmultimedia instructions. In: Proc. 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1999), vol. 4, pp. 2131–2134 (1999)

    Google Scholar 

  8. Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proc. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 2, pp. 1109–1112 (2001)

    Google Scholar 

  9. Rodriguez, V.P.: A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures. In: Proc. 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002), vol. 3, pp. 3220–3223 (2002)

    Google Scholar 

  10. Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.W.: SIMD vectorization of straight line FFT code. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 251–260. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)

    Article  Google Scholar 

  12. Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W.: Efficient utilization of SIMD extensions. Proc. IEEE 93, 409–425 (2005)

    Article  Google Scholar 

  13. Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4, 23–35 (1990)

    Article  Google Scholar 

  14. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)

    MATH  Google Scholar 

  15. Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1998), pp. 1381–1384 (1998)

    Google Scholar 

  16. Intel Corporation: IA-32 Intel Architecture Software Developer’s Manual Volume 1: Basic Architecture (2004)

    Google Scholar 

  17. Intel Corporation: Intel C++ Compiler for Linux Systems User’s Guide (2004)

    Google Scholar 

  18. Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)

    Article  MATH  Google Scholar 

  19. Sumimoto, S., Tezuka, H., Hori, A., Harada, H., Takahashi, T., Ishikawa, Y.: High performance communication using a commodity network for cluster systems. In: Proc. Ninth International Symposium on High Performance Distributed Computing (HPDC-9), pp. 139–146 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takahashi, D., Boku, T., Sato, M. (2006). An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_139

Download citation

  • DOI: https://doi.org/10.1007/11558958_139

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29067-4

  • Online ISBN: 978-3-540-33498-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics