Skip to main content

Automatic Tuning for Parallel FFTs

  • Chapter
  • First Online:
Software Automatic Tuning

Abstract

In this paper, we propose an implementation of parallel fast Fourier transforms (FFTs) with automatic performance tuning on distributed-memory parallel computers. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301

    Article  MathSciNet  MATH  Google Scholar 

  2. Swarztrauber PN (1987) Multiprocessor FFTs. Parallel Comput 5:197–210

    Article  MathSciNet  MATH  Google Scholar 

  3. Agarwal RC, Gustavson FG, Zubair M (1994) A high performance parallel algorithm for 1-D FFT. In: Proceedings of the Supercomputing 1994, Washington, DC. pp 34–40

    Google Scholar 

  4. Hegland M (1994) A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer Math 68:507–547

    Article  MathSciNet  MATH  Google Scholar 

  5. Edelman A, McCorquodale P, Toledo S (1999) The future fast Fourier transform? SIAM J Sci Comput 20:1094–1114

    Article  MathSciNet  MATH  Google Scholar 

  6. Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93: 216–231

    Article  Google Scholar 

  7. Püschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer BW, Xiong J, Franchetti F, Gacic A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: Code generation for DSP transforms. Proc IEEE 93:232–275

    Article  Google Scholar 

  8. Mirković D, Johnsson SL (2001) Automatic performance tuning in the UHFFT library. In: Proceedings of the 2001 International Conference on Computational Science (ICCS 2001). Lecture Notes in Computer Science, Vol 2073, Springer, pp 71–80

    Google Scholar 

  9. Bonelli A, Franchetti F, Lorenz J, Püschel M, Ueberhuber CW (2006) Automatic performance optimization of the discrete Fourier transform on distributed memory computers. In: Proceedings of 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006). Lecture Notes in Computer Science, Vol 4330, Springer, pp 818–832

    Google Scholar 

  10. Takahashi D, Boku T, Sato M (2002) A blocking algorithm for parallel 1-D FFT on clusters of PCs. In: Proceedings of the 8th International Euro-Par Conference (Euro-Par 2002). Lecture Notes in Computer Science, Vol 2400, Springer, pp 691–700

    Google Scholar 

  11. FFTE: A Fast Fourier Transform Package. http://www.ffte.jp/.

  12. Swarztrauber PN (1984) FFT algorithms for vector computers. Parallel Comput 1:45–63

    Article  MATH  Google Scholar 

  13. Van Loan C (1992) Computational frameworks for the fast Fourier transform. SIAM, Philadelphia, PA

    Book  MATH  Google Scholar 

  14. Faraj A, Yuan X (2005) Automatic generation and tuning of mpi collective communication routines. In: Proceedings of the 19th ACM International Conference on Supercomputing (ICS’05). pp 393–402

    Google Scholar 

  15. Kumar R, Mamidala A, Panda DK (2008) Scaling alltoall collective on multi-core systems. In: Proceedings of the 2008 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008)

    Google Scholar 

  16. MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu/.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer New York

About this chapter

Cite this chapter

Takahashi, D. (2011). Automatic Tuning for Parallel FFTs. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6935-4_4

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-6934-7

  • Online ISBN: 978-1-4419-6935-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics