Abstract
In the present paper, we propose a hybrid MPI/OpenMP implementation of a parallel three-dimensional fast Fourier transform (FFT) algorithm on SMP clusters. The three-dimensional FFT algorithm can be altered to create a block three-dimensional FFT algorithm in order to reduce the number of cache misses. We then use the obtained block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT. We succeeded in obtaining a performance of over 14 GFLOPS on the AIST Super Cluster M-64 (using 32 nodes out of 132 available, Itanium2 1.3 GHz, 4-way SMP).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Agarwal, R.C., Gustavson, F.G., Zubair, M.: An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 129–133 (1994)
Hegland, M.: Real and complex fast Fourier transforms on the Fujitsu VPP 500. Parallel Computing 22, 539–553 (1996)
Calvin, C.: Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication. Parallel Computing 22, 1255–1279 (1996)
Takahashi, D.: Efficient implementation of parallel three-dimensional FFT on clusters of PCs. Computer Physics Communications 152, 144–150 (2003)
Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4, 23–35 (1990)
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1998), pp. 1381–1384 (1998)
Sumimoto, S., Tezuka, H., Hori, A., Harada, H., Takahashi, T., Ishikawa, Y.: High performance communication using a commodity network for cluster systems. In: Proc. Ninth International Symposium on High Performance Distributed Computing (HPDC-9), pp. 139–146 (2000)
Cappello, F.R., Richard, O., Etiemble, D.: MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks. In: Proc. 2000 ACM/IEEE Conference on Supercompuring (SC 2000) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takahashi, D. (2006). A Hybrid MPI/OpenMP Implementation of a Parallel 3-D FFT on SMP Clusters. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2005. Lecture Notes in Computer Science, vol 3911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752578_117
Download citation
DOI: https://doi.org/10.1007/11752578_117
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34141-3
Online ISBN: 978-3-540-34142-0
eBook Packages: Computer ScienceComputer Science (R0)