Influence of memory access patterns to small-scale FFT performance

Lobeiras, J.; Amor, M.; Doallo, R.

doi:10.1007/s11227-012-0807-5

Influence of memory access patterns to small-scale FFT performance

Published: 05 July 2012

Volume 64, pages 120–131, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

J. Lobeiras¹,
M. Amor¹ &
R. Doallo¹

319 Accesses
6 Citations
Explore all metrics

Abstract

Modern GPUs (Graphics Processing Units) offer very high computing power at relative low cost. To take advantage of their computing resources and develop efficient implementations is essential to have certain knowledge about the architecture and memory hierarchy. In this paper, we use the FFT (Fast Fourier Transform) as a benchmark tool to analyze different aspects of GPU architectures, like the influence of the memory access pattern or the impact of the register pressure. The FFT is a good tool for performance analysis because it is used in many digital signal processing applications and has a good balance between computational cost and memory bandwidth requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Article 20 January 2021

GPU Architecture

Exploring Functional Acceleration of OpenCL on FPGAs and GPUs Through Platform-Independent Optimizations

References

Nukada A, Matsuoka S (2009) Auto-tuning 3-D FFT library for CUDA GPUs. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis, pp 1–10
Chapter Google Scholar
Wong H, Papadopoulou M-M, Sadooghi-Alvandi M, Moshovos A (2010) Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE international symposium on performance analysis of systems software (ISPASS), pp 235–246
Chapter Google Scholar
Intel (2009) Intel integrated performance primitives for Intel architecture, reference manual. Signal processing, vol 1
Google Scholar
Lobeiras J, Amor M, Doallo R (2011) FFT implementation on a streaming architecture. In: PDP ’11: proceedings of the 19th Euromicro conference on parallel, distributed and network-based processing. IEEE Computer Society, Los Alamitos, pp 381–388
Google Scholar
Lobeiras J, Amor M, Doallo R (2011) Performance evaluation of GPU memory hierarchy using the FFT. In: Proceedings of the international conference on computational and mathematical methods in science and engineering (CMMSE 2011), vol 2, pp 750–761
Google Scholar
Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2010), vol 45, pp 115–126
Chapter Google Scholar
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301
Article MathSciNet MATH Google Scholar
Pease MC (1968) An adaptation of the fast Fourier transform for parallel processing. J ACM 15(2):252–264
Article MATH Google Scholar
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th international symposium on computer architecture (ISCA ’09), vol 37, pp 152–163
Google Scholar
Baghsorkhi SS et al (2010) An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15 th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2010), pp 105–114
Chapter Google Scholar
Volkov V (2010) Better performance at lower occupancy. In: GPU technology conference (GTC 2010)
Google Scholar
Volkov V (2010) Use registers and multiple outputs per thread on GPU. In: International workshop on parallel matrix algorithms and applications (PMAA’10)
Google Scholar
Zhang Y, Owens JD (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high-performance computer architecture (HPCA 17)
Google Scholar

Download references

Acknowledgements

This work was supported by the Xunta de Galicia under projects 08TIC001206PR and INCITE08PXIB105161PR, the Ministry of Science and Innovation, cofunded by the FEDER funds of the European Union under the grant TIN2010-16735, and the Consolidation of Competitive Research Groups ref. 2010/06.

Author information

Authors and Affiliations

Computer Architecture Group (GAC), University of A Coruña (UDC), A Coruña, Spain
J. Lobeiras, M. Amor & R. Doallo

Authors

J. Lobeiras
View author publications
You can also search for this author inPubMed Google Scholar
M. Amor
View author publications
You can also search for this author inPubMed Google Scholar
R. Doallo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to J. Lobeiras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lobeiras, J., Amor, M. & Doallo, R. Influence of memory access patterns to small-scale FFT performance. J Supercomput 64, 120–131 (2013). https://doi.org/10.1007/s11227-012-0807-5

Download citation

Published: 05 July 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s11227-012-0807-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Influence of memory access patterns to small-scale FFT performance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating the performance of FFT library implementations on modern hybrid computing systems

GPU Architecture

Exploring Functional Acceleration of OpenCL on FPGAs and GPUs Through Platform-Independent Optimizations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now