Skip to main content
Log in

Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The article proposes a cost model for two implementations of the all-pairs-shortest-path (APSP) problem on graphics processing units (GPUs) with a particular focus on the correct reproduction of the shape of the runtime curve for different input problem sizes and thread block configurations. The model categorizes thread blocks according to the number of active warps and defines four different approaches to estimate the mapping of thread blocks to the hardware execution resources. Each of these approaches leads to a runtime prediction, which together span an interval for the expected execution time. An experimental evaluation shows that most characteristics of the runtime curve are predicted correctly, which is especially important for small graphs where a tiny increase of the input size may result in a significant increase of the execution time. For large graphs, we show that the cost prediction deviates less than 1 % from the measured times in many cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Baghsorkhi S, Delahaye M, Patel S, Gropp W, Hwu WmW (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114

    Article  Google Scholar 

  2. Brodtkorb A, Hagen T, Sætra M (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13

    Article  Google Scholar 

  3. Buluç A, Gilbert J, Budak C (2010) Solving path problems on the GPU. Parallel Comput 36(5–6):241–253

    Article  MATH  Google Scholar 

  4. Che S, Skadron K (2014) BenchFriend: correlating the performance of GPU benchmarks. Int J High Perform Comput Appl 28(2):238–250

    Article  Google Scholar 

  5. Guo P, Wang L (2014) Accurate cross architecture performance modeling for sparse matrix—vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217

  6. Harish P, Narayanan P (2007) Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th international conference on high performance computing (HiPC ’07). Springer, Berlin pp 197–208

  7. Hasan K, Chatterjee A, Radhakrishnan S, Antonio J (2014) Performance prediction model and analysis for compute-intensive tasks on GPUs. In: Hsu CH, Shi X, Salapura V (eds) Proceedings of the 11th IFIP international conference on network and parallel computing (NPC’14). Lecture notes in computer science, vol 8707. Springer, Berlin, pp 612–617

  8. Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Archit News 37(3):152–163

    Article  MathSciNet  Google Scholar 

  9. Hu Z, Liu G, Dong W (2014) A throughput-aware analytical performance model for GPU applications. In: Proceedings of the 10th annual conference on advanced computer architecture (ACA ’14). Springer, Berlin, pp 98–112

  10. Karami A, Mirsoleimani S, Khunjush F (2013) A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: Proceedings of the 17th CSI international symposium on computer architecture and digital systems (CADS ’13). IEEE, pp 15–22

  11. Kerr A, Diamos G, Yalamanchili S (2010) Modeling GPU–CPU workloads and systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU ’10). ACM, New York, pp 31–42

  12. Khronos OpenCL Working Group (2013) The OpenCL specification, Version 2.0. http://www.khronos.org/opencl

  13. Kothapalli K, Mukherjee R, Rehman M, Patidar S, Narayanan P, Srinathan K (2009) A performance prediction model for the CUDA GPGPU platform. In: Proceedings of the 2009 international conference on high performance computing (HiPC ’09). IEEE, pp 463–472

  14. Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO ’13). IEEE Computer Society, Washington, pp 1–10

  15. Lawler E (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York

  16. Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2014) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst PP(99):1–1

    Google Scholar 

  17. Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of the 25th international conference on application-specific systems, architectures and processors (ASAP ’14). IEEE, pp. 84–91

  18. Meng J, Morozov V, Kumaran K, Vishwanath V, Uram T (2011) GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC ’11). ACM, New York, pp 14:1–14:11

  19. Nvidia: CUDA occupancy calculator. http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

  20. Nvidia (2015) CUDA toolkit documentation, Version 7.0. http://docs.nvidia.com/cuda

  21. Sato K, Komatsu K, Takizawa H, Kobayashi H (2011) A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems. In: Proceedings of the 9th international symposium on parallel and distributed processing with applications (ISPA ’11). IEEE Computer Society, Washington, pp 135–142

  22. Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN Not 47(8):11–22

    Article  Google Scholar 

  23. Torres Y, Gonzalez-Escribano A, Llanos D (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65(3):1150–1163

    Article  Google Scholar 

  24. Tran QN (2010) Designing efficient many-core parallel algorithms for all-pairs shortest-paths using CUDA. In: Proceedings of the 7th international conference on information technology: new generations (ITNG ’10). IEEE Computer Society, Washington, pp 7–12

  25. Zhang Y, Owens J (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE international symposium on high performance computer architecture (HPCA ’11). IEEE Computer Society, Washington, pp 382–393

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jörg Dümmler.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dümmler, J., Egerland, S. Interval-based performance modeling for the all-pairs-shortest-path problem on GPUs. J Supercomput 71, 4192–4214 (2015). https://doi.org/10.1007/s11227-015-1514-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1514-9

Keywords

Navigation