Skip to main content

Automatic Performance Optimization of the Discrete Fourier Transform on Distributed Memory Computers

  • Conference paper
Parallel and Distributed Processing and Applications (ISPA 2006)

Abstract

This paper introduces a formal framework for automatically generating performance optimized implementations of the discrete Fourier transform (DFT) for distributed memory computers. The framework is implemented as part of the program generation and optimization system Spiral. DFT algorithms are represented as mathematical formulas in Spiral’s internal language SPL. Using a tagging mechanism and formula rewriting, we extend Spiral to automatically generate parallelized formulas. Using the same mechanism, we enable the generation of rescaling DFT algorithms, which redistribute the data in intermediate steps to fewer processors to reduce communication overhead. It is a novel feature of these methods that the redistribution steps are merged with the communication steps of the algorithm to avoid additional communication overhead. Among the possible alternative algorithms, Spiral’s search mechanism now determines the fastest for a given platform, effectively generating adapted code without human intervention. Experiments with DFT MPI programs generated by Spiral show performance gains of up to 30% due to rescaling. Further, our generated programs compare favorably with Fftw-MPI 2.1.5.

This work was supported by the Special Research Program SFB F011 “AURORA” and the Erwin Schrödinger Fellowship of the Austrian Science Fund FWF, and in part by DARPA through the Department of Interior grant NBCH1050009 and by NSF through awards 0234293 and 0325687.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adelmann, A., Bonelli, A., Petersen, W.P., Ueberhuber, C.W.: Communication efficiency of parallel 3D FFTs. In: VECPAR 2004, vol. III, pp. 901–907 (2004)

    Google Scholar 

  2. Baumgartner, G., Auer, A., Bernholdt, D.E., Bibireata, A., Choppella, V., Cociorva, D., Gao, X., Harrison, R.J., Hirata, S., Krishnamoorthy, S., Krishnan, S., Lam, C., Lu, Q., Nooijen, M., Pitzer, R.M., Ramanujam, J., Sadayappan, P., Sibiryakov, A.: Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. In: [17], pp. 276–292 (2005)

    Google Scholar 

  3. Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLapack Users’ Guide. SIAM, Philadelphia, PA (1997)

    Google Scholar 

  4. Dershowitz, N., Plaisted, D.A.: Rewriting. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, ch. 9, vol. 1, pp. 535–610. Elsevier, Amsterdam (2001)

    Chapter  Google Scholar 

  5. Eleftheriou, M., Fitch, B., Rayshubskiy, A., Ward, T.C., Germain, R.: Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements. IBM Journal of Research and Development 49(2/3), 457–464 (2005)

    Article  Google Scholar 

  6. Faraj, A., Yuan, X.: Automatic generation and tuning of MPI collective communication routines. In: Proc. International Conference on Supercomputing (ICS), pp. 393–402 (2005)

    Google Scholar 

  7. Franchetti, F., Püschel, M.: A SIMD vectorizing compiler for digital signal processing algorithms. In: Proc. International Parallel and Distributed Processing Symposium (IPDPS), pp. 20–26 (2002)

    Google Scholar 

  8. Franchetti, F., Voronenko, Y., Püschel, M.: Loop merging for signal transforms. In: Proc. Programming Language Design and Implementation (PLDI), pp. 315–326 (2005)

    Google Scholar 

  9. Franchetti, F., Voronenko, Y., Püschel, M.: FFT program generation for shared memory: SMP and multicore. In: Proc. Supercomputing, SC (2006)

    Google Scholar 

  10. Franchetti, F., Voronenko, Y., Püschel, M.: A rewriting system for the vectorization of signal transforms. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 363–377. Springer, Heidelberg (2007) (On CD-ROM)

    Chapter  Google Scholar 

  11. Frigo, M.: A fast Fourier transform compiler. In: Proc. Programming Language Design and Implementation (PLDI), pp. 169–180 (1999)

    Google Scholar 

  12. Frigo, M., Johnson, S.G.: Fftw: An adaptive software architecture for the FFT. In: Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 1381–1384. IEEE, Los Alamitos (1998)

    Google Scholar 

  13. Frigo, M., Johnson, S.G.: The design and implementation of Fftw3. In: [17], pp. 216–231 (2005)

    Google Scholar 

  14. Goumas, G., Drosinos, N., Athanasaki, M., Koziris, N.: Automatic parallel code generation for tiled nested loops. In: Proc. Symposium on Applied Computing (SAC), pp. 1412–1419. ACM Press, New York (2004)

    Google Scholar 

  15. Gygi, F., Draeger, E., de Supinski, B.R., Yates, R.K., Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.W., Gunnels, J., Sexton, J.: Large-scale first-principles molecular dynamics simulations on the Blue Gene/L platform using the Qbox code. In: Proc. Supercomputing (SC), p. 24 (2005)

    Google Scholar 

  16. Johnson, J., Chen, K.: A self-adapting distributed memory package for fast signal transforms. In: Proc. International Parallel and Distributed Processing Symposium (IPDPS), p. 44a (2004)

    Google Scholar 

  17. Moura, J.M.F., Püschel, M., Padua, D., Dongarra, J. (eds.): Special Issue on Program Generation, Optimization, and Platform Adaptation, Proceedings of the IEEE 93(2) (2005)

    Google Scholar 

  18. Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. Cluster Computing Journal, Special Issue on Performance Modeling and Evaluation of Parallel and Distributed Systems (accepted for publication, 2006)

    Google Scholar 

  19. Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: Spiral: Code generation for DSP transforms. In: [17], pp. 232–275 (2005)

    Google Scholar 

  20. Spiral web site, http://www.spiral.net

  21. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. Frontiers in Applied Mathematics, vol. 10. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1992)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bonelli, A., Franchetti, F., Lorenz, J., Püschel, M., Ueberhuber, C.W. (2006). Automatic Performance Optimization of the Discrete Fourier Transform on Distributed Memory Computers. In: Guo, M., Yang, L.T., Di Martino, B., Zima, H.P., Dongarra, J., Tang, F. (eds) Parallel and Distributed Processing and Applications. ISPA 2006. Lecture Notes in Computer Science, vol 4330. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946441_74

Download citation

  • DOI: https://doi.org/10.1007/11946441_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68067-3

  • Online ISBN: 978-3-540-68070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics