Abstract
We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 40962 with 32 processors. Our FFT kernel outperformed the other existing libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cooley, J.W., Tukey, J.W.: An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput. 19, 297–301 (1965)
Dunigan, T.H., Vetter, J.S., Worley, P.H.: Performance evaluation of the SGI Altix 3700. In: ICPP, pp. 231–240 (2005)
Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 452–471 (1991)
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)
Intel Coporation: Itanium Architecture Software Developer’s Manual Revision 2.1 (2002)
Colwell, R.P., et al.: A VLIW architecture for a trace scheduling compiler. IEEE Trans. on Computers 37, 967–979 (1988)
Gwennap, L.: Intel, HP make EPIC disclosure. Microprocessor Report 11, 1–9 (1997)
Rau, B.R.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. 27th Annual International Symposium on Microarchitecture, San Jose, CA, pp. 63–74 (1994)
Pease, M.C.: An adaptation of the fast Fourier transform for parallel processing. J. ACM 15, 252–264 (1968)
Linzer, E.N., Feig, E.: Implementation of efficient FFT algorithms on fused multiply-add architectures. IEEE Trans. Signal Processing 41, 93–107 (1993)
Goedecker, S.: Fast radix 2,3,4 and 5 kernels for fast Fourier transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18, 1605–1611 (1997)
Karner, H., et al.: Multiply-Add Optimized FFT Kernels. Math. Models and Methods in Appl. Sci. 11, 105–117 (2001)
Bergland, G.D.: A fast Fourier transform algorithm using base 8 iterations. Math. Comp. 22, 275–279 (1968)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE, special issue on ”Program Generation, Optimization, and Platform Adaptation” 93, 216–231 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nukada, A., Takahashi, D., Suda, R., Nishida, A. (2007). High Performance FFT on SGI Altix 3700. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75444-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-75444-2_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75443-5
Online ISBN: 978-3-540-75444-2
eBook Packages: Computer ScienceComputer Science (R0)