Abstract
Fastest Fourier Transform in the West (FFTW) is an adaptive FFT library that generates highly efficient Discrete Fourier Transform (DFT) implementations. It is one of the fastest FFT libraries available and it outperforms many adaptive or hand-tuned DFT libraries. Its success largely relies on the huge search space spanned by several FFT algorithms and a set of compiler generated C code (called codelets) for small size DFTs. FFTW empirically finds the best algorithm by measuring the performance of different algorithm combinations. Although the empirical search works very well for FFTW, the search process does not explain why the best plan found performs best, and the search overhead grows polynomially as the DFT size increases. The opposite of empirical search is model-driven optimization. However, it is widely believed that model-driven optimization is inferior to empirical search and is particularly powerless to solve problems as complex as the optimization of DFT.
In this paper, we propose a model-driven DFT performance predictor that can replace the empirical search engine in FFTW. Our technique adapts to different architectures and automatically predicts the performance of DFT algorithms and codelets (including SIMD codelets). Our experiments show that this technique renders DFT implementations that achieve more than 95% of the performance with the original FFTW and uses less than 5% of the search overhead on four test platforms. More importantly, our models give insight on why different combinations of DFT algorithms perform differently on a processor given its architectural features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Discussion with Franz Franchetti (2009)
Bluestein, L.: A linear Filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics 18(4), 451–455 (1970)
Chen, C., Chame, J., et al.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: Proceedings of CGO, Washington, DC, USA, pp. 111–122. IEEE Computer Society, Los Alamitos (2005)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine computation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)
Duhamel, P., Vetterli, M.: Fast fourier transforms: a tutorial review and a state of the art. Signal Processing 19(4), 259–299 (1990)
Fraguela, B.B., Voronenko, Y., et al.: Automatic tuning of discrete fourier transforms driven by analytical modeling. To appear in PACT (2009)
Frigo, M.: A fast Fourier transform compiler. ACM SIGPLAN Notices 34(5), 169–180 (1999)
Frigo, M., Johnson, S.G.: The Fastest Fourier Transform in the West (1997)
Frigo, M., Johnson, S.G.: FFTW manual version 3.1–The Fastest Fourier Transform in the West. Massachusetts Institute of Technology, Massachusetts (2004)
Frigo, M., Johnson, S.G.: The design and implementation of fftw3. Proceeding of the IEEE 93(2), 216–231 (2005)
Im, E.-J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis (2000); Chair-Katherine A. Yelick
Kulkarniand, P.A., Whalley, D.B., et al.: In search of near-optimal optimization phase orderings. SIGPLAN Not. 41(7), 83–92 (2006)
Oppenheim, A.V., Schafer, R.W., et al.: Discrete-Time Signal Processing (1999)
Püschel, M., Moura, J.M.F., et al.: SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2), 232–275 (2005)
Rader, C.M.: Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE 56(6), 1107–1108 (1968)
Saad, Y.: Research Institute for Advanced Computer Science (US). Sparskit: A Basic Tool Kit for Sparse Matrix Computation (1994)
Saavedra, R.H., Smith, A.J.: Analysis of benchmark characteristics and benchmark performance prediction. ACM Transactions on Computer Systems (TOCS) 14(4), 344–384 (1996)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1-2), 3–35 (2001)
Yotov, K., Li, X., et al.: Is Search Really Necessary to Generate High-Performance BLAS? Proceedings of the IEEE 93(2), 358–386 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gu, L., Li, X. (2010). DFT Performance Prediction in FFTW. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-13374-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)