Skip to main content

High Performance FFT on SGI Altix 3700

  • Conference paper
High Performance Computing and Communications (HPCC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4782))

Abstract

We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 40962 with 32 processors. Our FFT kernel outperformed the other existing libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooley, J.W., Tukey, J.W.: An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput. 19, 297–301 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  2. Dunigan, T.H., Vetter, J.S., Worley, P.H.: Performance evaluation of the SGI Altix 3700. In: ICPP, pp. 231–240 (2005)

    Google Scholar 

  3. Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 452–471 (1991)

    Article  Google Scholar 

  4. Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)

    Article  MATH  Google Scholar 

  5. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)

    MATH  Google Scholar 

  6. Intel Coporation: Itanium Architecture Software Developer’s Manual Revision 2.1 (2002)

    Google Scholar 

  7. Colwell, R.P., et al.: A VLIW architecture for a trace scheduling compiler. IEEE Trans. on Computers 37, 967–979 (1988)

    Article  Google Scholar 

  8. Gwennap, L.: Intel, HP make EPIC disclosure. Microprocessor Report 11, 1–9 (1997)

    Google Scholar 

  9. Rau, B.R.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. 27th Annual International Symposium on Microarchitecture, San Jose, CA, pp. 63–74 (1994)

    Google Scholar 

  10. Pease, M.C.: An adaptation of the fast Fourier transform for parallel processing. J. ACM 15, 252–264 (1968)

    Article  MATH  Google Scholar 

  11. Linzer, E.N., Feig, E.: Implementation of efficient FFT algorithms on fused multiply-add architectures. IEEE Trans. Signal Processing 41, 93–107 (1993)

    Article  MATH  Google Scholar 

  12. Goedecker, S.: Fast radix 2,3,4 and 5 kernels for fast Fourier transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18, 1605–1611 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  13. Karner, H., et al.: Multiply-Add Optimized FFT Kernels. Math. Models and Methods in Appl. Sci. 11, 105–117 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bergland, G.D.: A fast Fourier transform algorithm using base 8 iterations. Math. Comp. 22, 275–279 (1968)

    Article  MATH  MathSciNet  Google Scholar 

  15. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE, special issue on ”Program Generation, Optimization, and Platform Adaptation” 93, 216–231 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ronald Perrott Barbara M. Chapman Jaspal Subhlok Rodrigo Fernandes de Mello Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nukada, A., Takahashi, D., Suda, R., Nishida, A. (2007). High Performance FFT on SGI Altix 3700. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75444-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75444-2_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75443-5

  • Online ISBN: 978-3-540-75444-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics