Skip to main content

Automatically Tuned FFTs for BlueGene/L’s Double FPU

  • Conference paper
Book cover High Performance Computing for Computational Science - VECPAR 2004 (VECPAR 2004)

Abstract

IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueGene/L machine. We tuned our formal vectorization approach as well as the Vienna MAP vectorizer to support BlueGene/L’s custom two-way short vector SIMD “double” floating-point unit and connected the resulting methods to the automatic performance tuning systems Spiral and Fftw. Our approach produces automatically tuned high-performance FFT kernels for BlueGene/L that are up to 45% faster than the best scalar spiral generated code and up to 75% faster than Fftw when run on a single BlueGene/L processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almási, G.S., Bellofatto, R., Brunheroto, J.R., Caşcaval, C., Castaños, J.G., Ceze, L., Crumley, P., Erway, C.C., Gagliano, J., Lieber, D., Martorell, X., Moreira, J.E., Sanomiya, A., Strauss, K.: An overview of the blue gene/L system software organization. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 543–555. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Fisher, R.J., Dietz, H.G.: The scc compiler: SWARing at MMX and 3DNow. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 399–414. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Fisher, R.J., Dietz, H.G.: Compiling for SIMD within a register. In: Carter, L., Ferrante, J., Sehr, D., Chatterjee, S., Prins, J.F., Li, Z., Yew, P.-C. (eds.) LCPC 1998. LNCS, vol. 1656, pp. 290–304. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  4. Franchetti, F.: A portable short vector version of fftw. In: Proc. Fourth IMACS Symposium on Mathematical Modelling (MATHMOD 2003), vol. 2, pp. 1539–1548 (2003)

    Google Scholar 

  5. Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proc. ICASSP, vol. 2, pp. 1109–1112 (2001)

    Google Scholar 

  6. Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W.: Efficient Utilization of SIMD Extensions. In: IEEE Proceedings Special Issue on Program Generation, Optimization, and Platform Adaption (to appear)

    Google Scholar 

  7. Franchetti, F., Püschel, M.: A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms. In: Proc. IPDPS, pp. 20–26 (2002)

    Google Scholar 

  8. Franchetti, F., Püschel, M.: Short vector code generation and adaptation for DSP algorithms. In: Proceedings of the International Conerence on Acoustics, Speech, and Signal Processing, Conference Proceedings (ICASSP 2003), vol. 2, pp. 537–540 (2003)

    Google Scholar 

  9. Franchetti, F., Püschel, M.: Short vector code generation for the discrete Fourier transform. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), pp. 58–67 (2003)

    Google Scholar 

  10. Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp. 169–180. ACM Press, New York (1999)

    Chapter  Google Scholar 

  11. Frigo, M., Kral, S.: The Advanced FFT Program Generator genfft. In:aurora Technical Report TR2001-2003, vol. 3 (2001)

    Google Scholar 

  12. Frigo, M., Johnson, S.G.: fftw: An Adaptive Software Architecture for the FFT. In: ICASSP 1998, vol. 3, pp. 1381–1384 (1998)

    Google Scholar 

  13. Intel. Corporation, Intel. C/C++ compiler user’s guide (2002)

    Google Scholar 

  14. Johnson, J., Johnson, R.W., Rodriguez, D., Tolimieri, R.: A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures. In: IEEE Trans. on Circuits and Systems, vol. 9, pp. 449–500 (1990)

    Google Scholar 

  15. Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.: SIMD vectorization of straight line FFT code. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 251–260. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.: FFT compiler techniques. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 217–231. Springer, Heidelberg (2003)

    Google Scholar 

  17. Lamson, S.: Sciport (1995), Available: http://www.netlib.org/scilib/

  18. Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. ACM SIGPLAN Notices 35(5), 145–156 (2000)

    Article  Google Scholar 

  19. Leupers, R., Bashford, S.: Graph-based code selection techniques for embedded processors. ACM Transactions on Design Automation of Electronic Systems 5(4), 794–814 (2000)

    Article  Google Scholar 

  20. Lorenz, J., Kral, S., Franchetti, F., Ueberhuber, C.W.: Vectorization Techniques for BlueGene/L’s Double FPU. IBM Journal of Research and Development (to appear)

    Google Scholar 

  21. Lorenz, M., Wehmeyer, L., Draeger, T.: Energy aware compilation for DSPs with SIMD instructions. In: Proceedings of the 2002 Joint Conference on Languages, Compilers, and Tools for Embedded Systems & Software and Compilers for Embedded Systems (LCTES’02-SCOPES 2002), pp. 94–101 (2002)

    Google Scholar 

  22. Mirkovic, D., Johnsson, S.L.: Automatic Performance Tuning in the UHFFT Library. In: Proc. ICCS 2001, pp. 71–80 (2001)

    Google Scholar 

  23. Püschel, M., Singer, B., Xiong, J., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Johnson, R.W.: Spiral: A generator for platform-adapted libraries of signal processing algorithms. Journal on High Performance Computing and Applications, special issue on Automatic Performance Tuning 18, 21–45 (2004)

    Article  Google Scholar 

  24. Sreraman, N., Govindarajan, R.: A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming 28(4), 363–400 (2000)

    Article  Google Scholar 

  25. Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)

    Article  MATH  Google Scholar 

  26. Van Loan, C.F.: Computational Frameworks for the Fast Fourier Transform. Ser. Frontiers in Applied Mathematics, vol. 10. Society for Industrial and Applied Mathematics, Philadelphia (1992)

    Google Scholar 

  27. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. 27, 3–35 (2001)

    Article  MATH  Google Scholar 

  28. Xiong, J., Johnson, J., Johnson, R., Padua, D.: SPL: A Language and Compiler for DSP Algorithms. In: Proceedings of the Conference on Programming Languages Design and Implementation (PLDI), pp. 298–308 (2001)

    Google Scholar 

  29. Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM Press, New York (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Franchetti, F., Kral, S., Lorenz, J., Püschel, M., Ueberhuber, C.W. (2005). Automatically Tuned FFTs for BlueGene/L’s Double FPU. In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds) High Performance Computing for Computational Science - VECPAR 2004. VECPAR 2004. Lecture Notes in Computer Science, vol 3402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11403937_3

Download citation

  • DOI: https://doi.org/10.1007/11403937_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25424-9

  • Online ISBN: 978-3-540-31854-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics