Skip to main content
Log in

Influence of memory access patterns to small-scale FFT performance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern GPUs (Graphics Processing Units) offer very high computing power at relative low cost. To take advantage of their computing resources and develop efficient implementations is essential to have certain knowledge about the architecture and memory hierarchy. In this paper, we use the FFT (Fast Fourier Transform) as a benchmark tool to analyze different aspects of GPU architectures, like the influence of the memory access pattern or the impact of the register pressure. The FFT is a good tool for performance analysis because it is used in many digital signal processing applications and has a good balance between computational cost and memory bandwidth requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Nukada A, Matsuoka S (2009) Auto-tuning 3-D FFT library for CUDA GPUs. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis, pp 1–10

    Chapter  Google Scholar 

  2. Wong H, Papadopoulou M-M, Sadooghi-Alvandi M, Moshovos A (2010) Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE international symposium on performance analysis of systems software (ISPASS), pp 235–246

    Chapter  Google Scholar 

  3. Intel (2009) Intel integrated performance primitives for Intel architecture, reference manual. Signal processing, vol 1

    Google Scholar 

  4. Lobeiras J, Amor M, Doallo R (2011) FFT implementation on a streaming architecture. In: PDP ’11: proceedings of the 19th Euromicro conference on parallel, distributed and network-based processing. IEEE Computer Society, Los Alamitos, pp 381–388

    Google Scholar 

  5. Lobeiras J, Amor M, Doallo R (2011) Performance evaluation of GPU memory hierarchy using the FFT. In: Proceedings of the international conference on computational and mathematical methods in science and engineering (CMMSE 2011), vol 2, pp 750–761

    Google Scholar 

  6. Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2010), vol 45, pp 115–126

    Chapter  Google Scholar 

  7. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301

    Article  MathSciNet  MATH  Google Scholar 

  8. Pease MC (1968) An adaptation of the fast Fourier transform for parallel processing. J ACM 15(2):252–264

    Article  MATH  Google Scholar 

  9. Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th international symposium on computer architecture (ISCA ’09), vol 37, pp 152–163

    Google Scholar 

  10. Baghsorkhi SS et al (2010) An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15 th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2010), pp 105–114

    Chapter  Google Scholar 

  11. Volkov V (2010) Better performance at lower occupancy. In: GPU technology conference (GTC 2010)

    Google Scholar 

  12. Volkov V (2010) Use registers and multiple outputs per thread on GPU. In: International workshop on parallel matrix algorithms and applications (PMAA’10)

    Google Scholar 

  13. Zhang Y, Owens JD (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high-performance computer architecture (HPCA 17)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Xunta de Galicia under projects 08TIC001206PR and INCITE08PXIB105161PR, the Ministry of Science and Innovation, cofunded by the FEDER funds of the European Union under the grant TIN2010-16735, and the Consolidation of Competitive Research Groups ref. 2010/06.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Lobeiras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lobeiras, J., Amor, M. & Doallo, R. Influence of memory access patterns to small-scale FFT performance. J Supercomput 64, 120–131 (2013). https://doi.org/10.1007/s11227-012-0807-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0807-5

Keywords

Navigation