Automatic Tuning for Parallel FFTs

Takahashi, Daisuke

doi:10.1007/978-1-4419-6935-4_4

Daisuke Takahashi⁵

613 Accesses

Abstract

In this paper, we propose an implementation of parallel fast Fourier transforms (FFTs) with automatic performance tuning on distributed-memory parallel computers. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scalability Issues in FFT Computation

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

Article 14 December 2015

High Performance Dataframes from Parallel Processing Patterns

References

Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301
Article MathSciNet MATH Google Scholar
Swarztrauber PN (1987) Multiprocessor FFTs. Parallel Comput 5:197–210
Article MathSciNet MATH Google Scholar
Agarwal RC, Gustavson FG, Zubair M (1994) A high performance parallel algorithm for 1-D FFT. In: Proceedings of the Supercomputing 1994, Washington, DC. pp 34–40
Google Scholar
Hegland M (1994) A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer Math 68:507–547
Article MathSciNet MATH Google Scholar
Edelman A, McCorquodale P, Toledo S (1999) The future fast Fourier transform? SIAM J Sci Comput 20:1094–1114
Article MathSciNet MATH Google Scholar
Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93: 216–231
Article Google Scholar
Püschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer BW, Xiong J, Franchetti F, Gacic A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: Code generation for DSP transforms. Proc IEEE 93:232–275
Article Google Scholar
Mirković D, Johnsson SL (2001) Automatic performance tuning in the UHFFT library. In: Proceedings of the 2001 International Conference on Computational Science (ICCS 2001). Lecture Notes in Computer Science, Vol 2073, Springer, pp 71–80
Google Scholar
Bonelli A, Franchetti F, Lorenz J, Püschel M, Ueberhuber CW (2006) Automatic performance optimization of the discrete Fourier transform on distributed memory computers. In: Proceedings of 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006). Lecture Notes in Computer Science, Vol 4330, Springer, pp 818–832
Google Scholar
Takahashi D, Boku T, Sato M (2002) A blocking algorithm for parallel 1-D FFT on clusters of PCs. In: Proceedings of the 8th International Euro-Par Conference (Euro-Par 2002). Lecture Notes in Computer Science, Vol 2400, Springer, pp 691–700
Google Scholar
FFTE: A Fast Fourier Transform Package. http://www.ffte.jp/.
Swarztrauber PN (1984) FFT algorithms for vector computers. Parallel Comput 1:45–63
Article MATH Google Scholar
Van Loan C (1992) Computational frameworks for the fast Fourier transform. SIAM, Philadelphia, PA
Book MATH Google Scholar
Faraj A, Yuan X (2005) Automatic generation and tuning of mpi collective communication routines. In: Proceedings of the 19th ACM International Conference on Supercomputing (ICS’05). pp 393–402
Google Scholar
Kumar R, Mamidala A, Panda DK (2008) Scaling alltoall collective on multi-core systems. In: Proceedings of the 2008 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008)
Google Scholar
MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu/.
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
Daisuke Takahashi

Authors

Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

Central Research Laboratory, Hitachi Ltd., Higashi-Koigakubo 1-280, Kokubunji-shi, Tokyo, 185-8601, Japan
Ken Naono
Cray, Inc., Jackson St. 380, St Paul, 55101, Minnesota, USA
Keita Teranishi
Dept. Computer & Information Sciences, University of Delaware, Smith Hall 101, Newark, 19716, Delaware, USA
John Cavazos
Dept. Computer Science, University of Tokyo, Hongo 7-3-1, Tokyo, 113-0033, Japan
Reiji Suda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Takahashi, D. (2011). Automatic Tuning for Parallel FFTs. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_4

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6935-4_4
Published: 13 August 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6934-7
Online ISBN: 978-1-4419-6935-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Automatic Tuning for Parallel FFTs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalability Issues in FFT Computation

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

High Performance Dataframes from Parallel Processing Patterns

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Tuning for Parallel FFTs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalability Issues in FFT Computation

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

High Performance Dataframes from Parallel Processing Patterns

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation