Abstract
In this paper, a high performance parallelizing method of FFT is presented. Well known four or six step parallel algorithm with standard index map is not suitable for highly parallel computers, because it requires all-to-all communications between two phases of sub-FFTs which can not be overlap the computation of the each sub-FFT over the communication. We introduce another index map and algorithm which is intended to overcome the problem, and our results shows that our method out-perform the four step method in the 26 case out of 32 experiments. The results was obtained with up to 128 processors NEC Cenju-3 using the mini-MPI library.
Preview
Unable to display preview. Download preview PDF.
References
Van Loan, C: Computational Frameworks for the Fast Fourier Transform, SIAM, 1992
Swartztrauber, P.N.: Multiprocessor FFTs.Parallel Computing,no.5, (1987)197–210.
Hegland, M: Block Algorithms for FFTs on Vector and Parallel Computers, Parallel Computing: Trends and Applications, Elsevier Science, 1994
Takahashi, D., Kaneda, Y.: Implementation and Evaluation of 1-D FFT with External Memory on Parallel Computers, IPSJ SIG Notes, Vol.97, No.22, pp.7–12, 1997
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shimizu, N., Watanabe, T. (1997). High performance parallel FFT on distributed memory parallel computers. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024226
Download citation
DOI: https://doi.org/10.1007/BFb0024226
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive