Abstract
In the present paper, an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Streaming SIMD Extensions 3 (SSE3) instructions on dual-core processors is proposed. Combination of vectorization and the block six-step FFT algorithm is shown to effectively improve performance. The performance results for one-dimensional FFTs on dual-core Intel Xeon processors are reported. We successfully achieved performance of approximately 2006 MFLOPS on a dual-core Intel Xeon PC (2.8 GHz, two CPUs, four cores) and approximately 3492 MFLOPS on a dual-core Intel Xeon 5150 PC (2.66 GHz, two CPUs, four cores) for a 220-point FFT.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Nadehara, K., Miyazaki, T., Kuroda, I.: Radix-4 FFT implementation using SIMD multimedia instructions. In: Proc. 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1999), vol. 4, pp. 2131–2134 (1999)
Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proc. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 2, pp. 1109–1112 (2001)
Rodriguez, V.P.: A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures. In: Proc. 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002), vol. 3, pp. 3220–3223 (2002)
Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.W.: SIMD vectorization of straight line FFT code. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 251–260. Springer, Heidelberg (2003)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)
Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W.: Efficient utilization of SIMD extensions. Proc. IEEE 93, 409–425 (2005)
Intel Corporation: IA-32 Intel Architecture Software Developer’s Manual. Basic Architecture, vol. 1 (2006)
Intel Corporation: Intel C++ Compiler Documentation (2006)
Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4, 23–35 (1990)
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)
Takahashi, D.: A blocking algorithm for FFT on cache-based processors. In: Hertzberger, B., Hoekstra, A.G., Williams, R. (eds.) High-Performance Computing and Networking. LNCS, vol. 2110, pp. 551–554. Springer, Heidelberg (2001)
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)
Frigo, M., Johnson, S.G.: FFTW, http://www.fftw.org
Intel Corporation: Intel Math Kernel Library Reference Manual (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takahashi, D. (2007). An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_135
Download citation
DOI: https://doi.org/10.1007/978-3-540-75755-9_135
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)