Abstract
In this paper, we propose a blocking algorithm for a parallel one-dimensional fast Fourier transform (FFT) on clusters of PCs. Our proposed parallel FFT algorithm is based on the six-step FFT algorithm. The six-step FFT algorithm can be altered into a block nine-step FFT algorithm to reduce the number of cache misses. The block nine-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the block nine-step FFT algorithm to design the parallel one-dimensional FFT algorithm. In our proposed parallel FFT algorithm, since we use cyclic distribution, all-to-all communication is required only once. Moreover, the input data and output data are both can be given in natural order. We successfully achieved performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965) 297–301
Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Computing 5 (1987) 197–210
Agarwal, R.C., Gustavson, F.G., Zubair, M.: A high performance parallel algorithm for 1-D FFT. In: Proc. Supercomputing’ 94. (1994) 34–40
Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numerische Mathematik 68 (1994) 507–547
Edelman, A., McCorquodale, P., Toledo, S.: The future fast Fourier transform? SIAM J. Sci. Comput. 20 (1999) 1094–1114
Mirković, D., Johnsson, S, L.: Automatic performance tuning in the UHFFT library. In: Proc. 2001 International Conference on Computational Science (ICCS 2001). Volume 2073 of Lecture Notes in Computer Science., Springer-Verlag (2001) 71–80
Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4 (1990) 23–35
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)
Wadleigh, K.R.: High performance FFT algorithms for cache-coherent multiprocessors. The International Journal of High Performance Computing Applications 13 (1999) 163–171
Takahashi, D.: A blocking algorithm for FFT on cache-based processors. In: Proc. 9th International Conference on High Performance Computing and Networking Europe (HPCN Europe 2001). Volume 2110 of Lecture Notes in Computer Science., Springer-Verlag (2001) 551–554
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1 (1984) 45–63
Frigo, M., Johnson, S.G.: The fastest Fourier transform in the west. Technical Report MIT-LCS-TR-728, MIT Lab for Computer Science (1997)
Sumimoto, S., Tezuka, H., Hori, A., Harada, H., Takahashi, T., Ishikawa, Y.: High performance communication using a commodity network for cluster systems. In: Proc. Ninth International Symposium on High Performance Distributed Computing (HPDC-9). (2000) 139–146
Takahashi, D.: An extended split-radix FFT algorithm. IEEE Signal Processing Letters 8 (2001) 145–147
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takahashi, D., Boku, T., Sato, M. (2002). A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_96
Download citation
DOI: https://doi.org/10.1007/3-540-45706-2_96
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive