Abstract
In this paper, we propose an OpenMP implementation of a recursive algorithm for parallel fast Fourier transform (FFT) on shared-memory parallel computers. A recursive three-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the DELL PowerEdge 7150 and the hp workstation zx6000 are reported. We successfully achieved performance of about 757MFLOPS on the DELL PowerEdge 7150 (Itanium 800MHz, 4CPUs) and about 871MFLOPS on the hp workstation zx6000 (Itanium2 1GHz, 2CPUs) for 224-point FFT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965) 297–301
Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Computing 5 (1987) 197–210
Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4 (1990) 23–35
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)
Wadleigh, K.R.: High performance FFT algorithms for cache-coherent multiprocessors. The International Journal of High Performance Computing Applications 13 (1999) 163–171
Takahashi, D.: A blocking algorithm for parallel 1-D FFT on shared-memory parallel computers. In: Proc. 6th International Conference on Applied Parallel Computing (PARA 2002). Volume 2367 of Lecture Notes in Computer Science., Springer-Verlag (2002) 380–389
Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numerische Mathematik 68 (1994) 507–547
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP98). (1998) 1381–1384
Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48 (1999) 142–149
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1 (1984) 45–63
Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance evaluation of OpenMP applications with nested parallelism. In: Proc. 5th Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR 2000). Volume 1915 of Lecture Notes in Computer Science., Springer-Verlag (2000) 100–112
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takahashi, D., Sato, M., Boku, T. (2003). An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_8
Download citation
DOI: https://doi.org/10.1007/3-540-45009-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40435-4
Online ISBN: 978-3-540-45009-2
eBook Packages: Springer Book Archive