Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

Dümmler, Jörg; Egerland, Sebastian

doi:10.1007/s11227-015-1514-9

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

Published: 01 September 2015

Volume 71, pages 4192–4214, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jörg Dümmler¹ &
Sebastian Egerland¹

188 Accesses
2 Citations
Explore all metrics

Abstract

The article proposes a cost model for two implementations of the all-pairs-shortest-path (APSP) problem on graphics processing units (GPUs) with a particular focus on the correct reproduction of the shape of the runtime curve for different input problem sizes and thread block configurations. The model categorizes thread blocks according to the number of active warps and defines four different approaches to estimate the mapping of thread blocks to the hardware execution resources. Each of these approaches leads to a runtime prediction, which together span an interval for the expected execution time. An experimental evaluation shows that most characteristics of the runtime curve are predicted correctly, which is especially important for small graphs where a tiny increase of the input size may result in a significant increase of the execution time. For large graphs, we show that the cost prediction deviates less than 1 % from the measured times in many cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baghsorkhi S, Delahaye M, Patel S, Gropp W, Hwu WmW (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114
Article Google Scholar
Brodtkorb A, Hagen T, Sætra M (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13
Article Google Scholar
Buluç A, Gilbert J, Budak C (2010) Solving path problems on the GPU. Parallel Comput 36(5–6):241–253
Article MATH Google Scholar
Che S, Skadron K (2014) BenchFriend: correlating the performance of GPU benchmarks. Int J High Perform Comput Appl 28(2):238–250
Article Google Scholar
Guo P, Wang L (2014) Accurate cross architecture performance modeling for sparse matrix—vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217
Harish P, Narayanan P (2007) Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th international conference on high performance computing (HiPC ’07). Springer, Berlin pp 197–208
Hasan K, Chatterjee A, Radhakrishnan S, Antonio J (2014) Performance prediction model and analysis for compute-intensive tasks on GPUs. In: Hsu CH, Shi X, Salapura V (eds) Proceedings of the 11th IFIP international conference on network and parallel computing (NPC’14). Lecture notes in computer science, vol 8707. Springer, Berlin, pp 612–617
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Archit News 37(3):152–163
Article MathSciNet Google Scholar
Hu Z, Liu G, Dong W (2014) A throughput-aware analytical performance model for GPU applications. In: Proceedings of the 10th annual conference on advanced computer architecture (ACA ’14). Springer, Berlin, pp 98–112
Karami A, Mirsoleimani S, Khunjush F (2013) A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: Proceedings of the 17th CSI international symposium on computer architecture and digital systems (CADS ’13). IEEE, pp 15–22
Kerr A, Diamos G, Yalamanchili S (2010) Modeling GPU–CPU workloads and systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU ’10). ACM, New York, pp 31–42
Khronos OpenCL Working Group (2013) The OpenCL specification, Version 2.0. http://www.khronos.org/opencl
Kothapalli K, Mukherjee R, Rehman M, Patidar S, Narayanan P, Srinathan K (2009) A performance prediction model for the CUDA GPGPU platform. In: Proceedings of the 2009 international conference on high performance computing (HiPC ’09). IEEE, pp 463–472
Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO ’13). IEEE Computer Society, Washington, pp 1–10
Lawler E (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York
Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2014) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst PP(99):1–1
Google Scholar
Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of the 25th international conference on application-specific systems, architectures and processors (ASAP ’14). IEEE, pp. 84–91
Meng J, Morozov V, Kumaran K, Vishwanath V, Uram T (2011) GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC ’11). ACM, New York, pp 14:1–14:11
Nvidia: CUDA occupancy calculator. http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls
Nvidia (2015) CUDA toolkit documentation, Version 7.0. http://docs.nvidia.com/cuda
Sato K, Komatsu K, Takizawa H, Kobayashi H (2011) A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems. In: Proceedings of the 9th international symposium on parallel and distributed processing with applications (ISPA ’11). IEEE Computer Society, Washington, pp 135–142
Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN Not 47(8):11–22
Article Google Scholar
Torres Y, Gonzalez-Escribano A, Llanos D (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65(3):1150–1163
Article Google Scholar
Tran QN (2010) Designing efficient many-core parallel algorithms for all-pairs shortest-paths using CUDA. In: Proceedings of the 7th international conference on information technology: new generations (ITNG ’10). IEEE Computer Society, Washington, pp 7–12
Zhang Y, Owens J (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high performance computer architecture (HPCA ’11). IEEE Computer Society, Washington, pp 382–393

Download references

Author information

Authors and Affiliations

Chemnitz University of Technology, 09107, Chemnitz, Germany
Jörg Dümmler & Sebastian Egerland

Authors

Jörg Dümmler
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Egerland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jörg Dümmler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dümmler, J., Egerland, S. Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs. J Supercomput 71, 4192–4214 (2015). https://doi.org/10.1007/s11227-015-1514-9

Download citation

Published: 01 September 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11227-015-1514-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

Abstract

Access this article

Similar content being viewed by others

A Scheduling Theory Framework for GPU Tasks Efficient Execution

Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem

Understanding the SIMD Efficiency of Graph Traversal on GPU

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

Abstract

Access this article

Similar content being viewed by others

A Scheduling Theory Framework for GPU Tasks Efficient Execution

Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem

Understanding the SIMD Efficiency of Graph Traversal on GPU

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation