Abstract
IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueGene/L machine. We tuned our formal vectorization approach as well as the Vienna MAP vectorizer to support BlueGene/L’s custom two-way short vector SIMD “double” floating-point unit and connected the resulting methods to the automatic performance tuning systems Spiral and Fftw. Our approach produces automatically tuned high-performance FFT kernels for BlueGene/L that are up to 45% faster than the best scalar spiral generated code and up to 75% faster than Fftw when run on a single BlueGene/L processor.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almási, G.S., Bellofatto, R., Brunheroto, J.R., Caşcaval, C., Castaños, J.G., Ceze, L., Crumley, P., Erway, C.C., Gagliano, J., Lieber, D., Martorell, X., Moreira, J.E., Sanomiya, A., Strauss, K.: An overview of the blue gene/L system software organization. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 543–555. Springer, Heidelberg (2003)
Fisher, R.J., Dietz, H.G.: The scc compiler: SWARing at MMX and 3DNow. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 399–414. Springer, Heidelberg (2000)
Fisher, R.J., Dietz, H.G.: Compiling for SIMD within a register. In: Carter, L., Ferrante, J., Sehr, D., Chatterjee, S., Prins, J.F., Li, Z., Yew, P.-C. (eds.) LCPC 1998. LNCS, vol. 1656, pp. 290–304. Springer, Heidelberg (1999)
Franchetti, F.: A portable short vector version of fftw. In: Proc. Fourth IMACS Symposium on Mathematical Modelling (MATHMOD 2003), vol. 2, pp. 1539–1548 (2003)
Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proc. ICASSP, vol. 2, pp. 1109–1112 (2001)
Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W.: Efficient Utilization of SIMD Extensions. In: IEEE Proceedings Special Issue on Program Generation, Optimization, and Platform Adaption (to appear)
Franchetti, F., Püschel, M.: A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms. In: Proc. IPDPS, pp. 20–26 (2002)
Franchetti, F., Püschel, M.: Short vector code generation and adaptation for DSP algorithms. In: Proceedings of the International Conerence on Acoustics, Speech, and Signal Processing, Conference Proceedings (ICASSP 2003), vol. 2, pp. 537–540 (2003)
Franchetti, F., Püschel, M.: Short vector code generation for the discrete Fourier transform. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), pp. 58–67 (2003)
Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp. 169–180. ACM Press, New York (1999)
Frigo, M., Kral, S.: The Advanced FFT Program Generator genfft. In:aurora Technical Report TR2001-2003, vol. 3 (2001)
Frigo, M., Johnson, S.G.: fftw: An Adaptive Software Architecture for the FFT. In: ICASSP 1998, vol. 3, pp. 1381–1384 (1998)
Intel. Corporation, Intel. C/C++ compiler user’s guide (2002)
Johnson, J., Johnson, R.W., Rodriguez, D., Tolimieri, R.: A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures. In: IEEE Trans. on Circuits and Systems, vol. 9, pp. 449–500 (1990)
Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.: SIMD vectorization of straight line FFT code. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 251–260. Springer, Heidelberg (2003)
Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.: FFT compiler techniques. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 217–231. Springer, Heidelberg (2003)
Lamson, S.: Sciport (1995), Available: http://www.netlib.org/scilib/
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. ACM SIGPLAN Notices 35(5), 145–156 (2000)
Leupers, R., Bashford, S.: Graph-based code selection techniques for embedded processors. ACM Transactions on Design Automation of Electronic Systems 5(4), 794–814 (2000)
Lorenz, J., Kral, S., Franchetti, F., Ueberhuber, C.W.: Vectorization Techniques for BlueGene/L’s Double FPU. IBM Journal of Research and Development (to appear)
Lorenz, M., Wehmeyer, L., Draeger, T.: Energy aware compilation for DSPs with SIMD instructions. In: Proceedings of the 2002 Joint Conference on Languages, Compilers, and Tools for Embedded Systems & Software and Compilers for Embedded Systems (LCTES’02-SCOPES 2002), pp. 94–101 (2002)
Mirkovic, D., Johnsson, S.L.: Automatic Performance Tuning in the UHFFT Library. In: Proc. ICCS 2001, pp. 71–80 (2001)
Püschel, M., Singer, B., Xiong, J., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Johnson, R.W.: Spiral: A generator for platform-adapted libraries of signal processing algorithms. Journal on High Performance Computing and Applications, special issue on Automatic Performance Tuning 18, 21–45 (2004)
Sreraman, N., Govindarajan, R.: A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming 28(4), 363–400 (2000)
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)
Van Loan, C.F.: Computational Frameworks for the Fast Fourier Transform. Ser. Frontiers in Applied Mathematics, vol. 10. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. 27, 3–35 (2001)
Xiong, J., Johnson, J., Johnson, R., Padua, D.: SPL: A Language and Compiler for DSP Algorithms. In: Proceedings of the Conference on Programming Languages Design and Implementation (PLDI), pp. 298–308 (2001)
Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM Press, New York (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Franchetti, F., Kral, S., Lorenz, J., Püschel, M., Ueberhuber, C.W. (2005). Automatically Tuned FFTs for BlueGene/L’s Double FPU. In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds) High Performance Computing for Computational Science - VECPAR 2004. VECPAR 2004. Lecture Notes in Computer Science, vol 3402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11403937_3
Download citation
DOI: https://doi.org/10.1007/11403937_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25424-9
Online ISBN: 978-3-540-31854-5
eBook Packages: Computer ScienceComputer Science (R0)