Skip to main content
Log in

Four Easy Ways to a Faster FFT

  • Published:
Journal of Mathematical Modelling and Algorithms

Abstract

The Fast Fourier Transform (FFT) was named one of the Top Ten algorithms of the 20th century , and continues to be a focus of current research. A problem with currently used FFT packages is that they require large, finely tuned, machine specific libraries, produced by highly skilled software developers. Therefore, these packages fail to perform well across a variety of architectures. Furthermore, many need to run repeated experiments in order to ‘re-program’ their code to its optimal performance based on a given machine's underlying hardware. Finally, it is difficult to know which radix to use given a particular vector size and machine configuration. We propose the use of monolithic array analysis as a way to remove the constraints imposed on performance by a machine's underlying hardware, by pre-optimizing array access patterns. In doing this we arrive at a single optimized program. We have achieved up to a 99.6% increase in performance, and the ability to run vectors up to 8 388 608 elements larger, on our experimental platforms. Preliminary experiments indicate different radices perform better relative to a machine's underlying architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Openmp simple, portable, scalable smp programming, 2000.

  2. Agarwal, R. C., Gustavson, F. G. and Zubair, M.: A high performance parallel algorithm for 1-D FFT, In: Proc., Supercomputing '94, IEEE Computer Society Press, Washington, DC, 1994, pp. 34–40.

    Google Scholar 

  3. Bilmes, J., Asanovic, K., Chin, C.-W. and Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology, In: Proc. 1997 International Conference on Supercomputing, Vienna, Austria, July 1997, pp. 340–347.

  4. Center, C. M. H.: Top ten algorithms of the 20th century, Computing Science and Engineering Magazine, 1999.

  5. Chamberlain, B. L., Choi, S.-E., Lewis, C., Snyder, L., Weathersby W. D. and Lin, C.: The case for high-level parallel programming in ZPL, IEEE Comput. Sci. Engrg. 5(3) (1998), 76–86.

    Google Scholar 

  6. Chamberlain, B. L., Choi, S.-E., Lewis, E. C., Lin, C., Snyder, L. and Weathersby, W. D.: Factor-join: A unique approach to compiling array languages for parallel machines, In: D. Padua, A. Nicolau, D. Gelernter, U. Banerjee and D. Sehr (eds), Proc. Ninth International Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Comput. Sci. 1239, Springer-Verlag, New York, 1996, pp. 481–500.

    Google Scholar 

  7. Chamberlain, B. L., Choi, S.-E. and Snyder, L.: A compiler abstraction for machine independent parallel communication generation, In: Z. Li, P. C. Yew, S. Chatterjee, C. H. Huang, P. Sadayappan and D. Sehr (eds), Languages and Compilers for Parallel Computing, Lecture Notes in Comput. Sci. 1366, Springer-Verlag, New York, 1998, pp. 261–276.

    Google Scholar 

  8. Cormen, T.: Everything you always wanted to know about out-of-core ffts but were aftaid to ask, COMPASS Colloquia Series, U Albany, SUNY, 2000.

    Google Scholar 

  9. Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R. and von Eicken, T.: LogP: Toward a realistic model of parallel computation, In: Proc. Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1993, pp. 1–12.

  10. Dai, D. L., Gupta, S. K. S., Kaushik, S. D. and Lu, J. H.: EXTENT: A portable programming environment for designing and implementing high-performance block-recursive algorithms, In: Proc., Supercomputing '94, IEEE Computer Society Press, Washington, DC, 1994, pp. 49–58.

    Google Scholar 

  11. Dooling, D. and Mullin, L.: Indexing and distributing a general partitioned sparse array, Proc. Workshop on Solving Irregular Problems on Distributed Memory Machines, 1995.

  12. Elliott, D. F. and Rao, K. R.: Fast Transforms: Algorithms, Analyses, Applications, Academic Press, New York, 1982.

    Google Scholar 

  13. Frigo, M. and Johnson, S.: Fftw online documentation, Nov. 1999.

  14. Granata, J., Conner, M. and Tolimieri, R.: Recursive fast algorithms and the role of the tensor product, IEEE Trans. Signal Process. 40(12) (1992), 2921–2930.

    Google Scholar 

  15. Gupta, A. and Kumar, V.: The scalability of FFT on parallel computers, IEEE Trans. Parallel and Distributed Systems 4(8) (1993), 922–932.

    Google Scholar 

  16. Gupta, S., Huang, C.-H., Sadayappan, P. and Johnson, R.: On the synthesis of parallel programs from tensor product formulas for block recursive algorithms, In: U. Banerjee, D. Gelernter, A. Nicolau and D. Padua (eds), Proc. 5th International Workshop on Languages and Compilers for Parallel Computing (New Haven, Connecticut), Lecture Notes in Comput. Sci. 757, Springer-Verlag, New York, 1992, pp. 264–280.

    Google Scholar 

  17. Gupta, S. K. S., Huang, C.-H., Sadayappan, P. and Johnson, R. W.: Implementing fast Fourier transforms on distributed-memory multiprocessors using data redistributions, Parallel Processing Lett. 4(4) (1994), 477–488.

    Google Scholar 

  18. Gupta, S. K. S., Huang, C.-H., Sadayappan, P. and Johnson, R.W.: A framework for generating distributed-memory parallel programs for block recursive algorithms, J. Parallel Distributed Comput. 34(2) (1996), 137–153.

    Google Scholar 

  19. Hennessy, J. and Patterson, D.: Computer Architecture a Quantitative Approach, Morgan Kaufmann, California, 1996.

    Google Scholar 

  20. High Performance Fortran Forum. High Performance Fortran language specification, Scientific Programming 2(1-2) (1993), 1–170.

  21. Humphrey, W., Karmesin, S., Bassetti, F. and Reynders, J.: Optimization of data-parallel field expressions in the POOMA framework, In: Y. Ishikawa, R. R. Oldehoeft, J. Reyn ders and M. Tholburn (eds), Proc. First International Conference on Scientific Computing in Object-Oriented Parallel Environments (ISCOPE '97) (Marina del Rey, CA), Lecture Notes in Comput. Sci. 1343, Springer-Verlag, New York, 1997, pp. 185–194.

    Google Scholar 

  22. Hunt, H., Mullin, L. and Rosenkrantz, D.: A feasibility study on the high level design of both sequential and parallel algorithms applied ot the fft, Paper in progress, Department of CS SUNY, Albany, 2001.

    Google Scholar 

  23. Karmesin, S., Crotinger, J., Cummings, J., Haney, S., Humphrey, W., Reynders, J., Smith, S. and Williams, T.: Array design and expression evaluation in POOMA II, In: D. Caromel, R. R. Oldehoeft and M. Tholburn (eds), Proc. Second International Symposium on Scientific Computing in Object-Oriented Parallel Environments (ISCOPE '98) (Santa Fe, NM), Lecture Notes in Comput. Sci. 1505, Springer-Verlag, New York, 1998, pp.

    Google Scholar 

  24. Li, J. and Skjellum, A.: A poly-algorithm for parallel dense matrix multiplication on twodimensional process grid topologies, Mississippi State Univ., 1995.

  25. Lin, C. and Snyder, L.: ZPL: An array sublanguage, In: U. Banerjee, D. Gelernter, A. Nicolau and D. Padua (eds), Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (Portland, OR), Lecture Notes in Comput. Sci. 768, Springer-Verlag, New York, 1993, pp. 96–114.

    Google Scholar 

  26. Lumsdaine, A.: The matrix template library: A generic programming approach to high performance numerical linear algebra, In: Proceedings of International Symposium on Computing in Object-Oriented Parallel Environments, 1998.

  27. Lumsdaine, A. and McCandless, B.: Parallel extensions to the matrix template library, In: Proc. 8th SIAM Conference on Parallel Processing for Scientific Computing, SIAM Press, Philadelphia, 1997.

    Google Scholar 

  28. Miles, D.: Compute intensity and the FFT, In: Proc., Supercomputing '93 (Portland, OR), IEEE Computer Society Press, 1993, pp. 676–684.

  29. Mullin, L.: The Psi compiler project, In: Workshop on Compilers for Parallel Computers, TU Delft, Holland, 1993.

    Google Scholar 

  30. Mullin, L.: On the monolithic analysis of a general radix cooley-tukey fft: Design, development, and performance, Invited talk, Lincoln Labs, MIT, 2000.

  31. Mullin, L., Dooling, D., Sandberg, E. and Thibault, S.: Formal methods for portable, scalable, scheduling, routing, and communication protocol, Technical Report CSC 94-04, Dept. of CS, Univ. Missouri-Rolla, 1994.

  32. Mullin, L., Kluge, W. and Scholtz, S.: On programming scientific applications in SAC - a functional language extended by a subsystem for high level array operations, In: Proc. 8th International Workshop on Implementation of Functional Languages, Bonn/Germany, 1996.

  33. Mullin, L. and Small, S.: Three easy steps to a faster fft (no, we don't need a plan), Proc. 2001 International Symposium on Performance Evaluation of Computer and Telecommunication Systems, SPECTS 2001.

  34. Mullin, L. and Small, S.: Three easy steps to a faster fft (the story continues...), Proc. International Conference on Imaging Science, Systems, and Technology, CISST 2001.

  35. Mullin, L. M. R.: A mathematics of arrays, PhD thesis, Syracuse Univ., Dec. 1988.

  36. Mullin, L. R., Dooling, D., Sandberg, E. and Thibault, S.: Formal methods for scheduling, routing and communication protocol, In: Proc. Second International Symposium on High Performance Distributed Computing (HPDC-2), IEEE Computer Society, 1993.

  37. Mullin, L. R., Eggleston, D., Woodrum, L. J. and Rennie W.: The PGI-PSI project: Preprocessing optimizations for existing and new F90 intrinsics in HPF using compositional symmetric indexing of the Psi calculus, In: M. Gerndt (ed.), Proc. 6th Workshop on Compilers for Parallel Computers (Aachen, Germany), Forschungszentrum Jülich GmbH, 1996, pp. 345–355.

  38. Rosenkrantz, D., Mullin, L. and H. B. H. III: On materializations of array-valued temporaries, In: Proc. 13th International Workshop on Languages and Compilers for Parallel Computing 2000 (LCPC'00) (Yorktown Heights, NY), Springer-Verlag, New York, to be published.

  39. Skjellum, A., Doss, N. and Bangalore, P.: Driving issues in scalable libraries: Poly-algorithms, data distribution independence, redistribution, local storage schemes, In: Proc. Seventh SIAM Conference on Parallel Processing for Scientific Computing, SIAM Press, Philadelphia, 1996.

    Google Scholar 

  40. Thibault, S. and Mullin, L.: A pipeline implementation of LU-decomposition on a hypercube, Technical Report, Univ. Missouri-Rolla, 1994, TR 95-03.

  41. Tolimieri, R., An, M. and Lu, C.: Algorithms for Discrete Fourier Tranform and Convolution, Springer-Verlag, New York, 1989.

    Google Scholar 

  42. Tolimieri, R., An, M. and Lu, C.: Mathematics of Multidimensional Fourier Transform Algorithms, Springer-Verlag, New York, 1993.

    Google Scholar 

  43. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform, Frontiers in Applied Mathematics, SIAM, Philadelphia, 1992.

    Google Scholar 

  44. Veldhuizen, T.: Using C++ template metaprograms, C++ Report 7(4) (1995), 36–43. Reprinted in C++ Gems (ed. Stanley Lippman).

  45. Veldhuizen, T. L.: Expression templates, C++ Report 7(5) (1995), 26–31. Reprinted in C++ Gems (ed. Stanley Lippman).

  46. Veldhuizen, T. L.: Arrays in Blitz++, In: D. Caromel, R. R. Oldehoeft and M. Tholburn (eds), Proc. Second International Symposium on Scientific Computing in Object-Oriented Parallel Environments (ISCOPE '98) (Santa Fe, NM), Lecture Notes in Comput. Sci. 1505, Springer-Verlag, New York, 1998.

    Google Scholar 

  47. Whaley, R. C. and Dongarra, J. J.: Automatically tuned linear algebra software, Technical Report UT-CS-97-366, Department of Computer Science, Univ. Tennessee, Dec. 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mullin, L.R., Small, S.G. Four Easy Ways to a Faster FFT. Journal of Mathematical Modelling and Algorithms 1, 193–214 (2002). https://doi.org/10.1023/A:1020590506372

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020590506372

Navigation