Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Ortega-Arranz, Hector; Torres, Yuri; Gonzalez-Escribano, Arturo; Llanos, Diego R.

doi:10.1007/s11227-014-1212-z

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Published: 18 May 2014

Volume 70, pages 786–798, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hector Ortega-Arranz¹,
Yuri Torres¹,
Arturo Gonzalez-Escribano¹ &
…
Diego R. Llanos¹

290 Accesses
5 Citations
Explore all metrics

Abstract

During the last years, GPU manycore devices have demonstrated their usefulness to accelerate computationally intensive problems. Although arriving at a parallelization of a highly parallel algorithm is an affordable task, the optimization of GPU codes is a challenging activity. The main reason for this is the number of parameters, programming choices, and tuning techniques available, many of them related with complex and sometimes hidden architecture details. A useful strategy to systematically attack these optimization problems is to characterize the different kernels of the application, and use this knowledge to select appropriate configuration parameters. The All-Pair Shortest-Path (APSP) problem is a well-known problem in graph theory whose objective is to find the shortest paths between any pairs of nodes in a graph. This problem can be solved by highly parallel and computational intensive tasks, being a good candidate to be exploited by manycore devices. In this paper, we use kernel characterization criteria to optimize an APSP algorithm implementation for NVIDIA GPUs. Our experimental results show that the combined use of proper configuration policies, and the concurrent kernels capability of new CUDA architectures, leads to a performance improvement of up to 62 % with respect to one of the possible configurations recommended by CUDA, considered as baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Article 01 November 2017

Thi Yen Phuong, Deok-Young Lee & Jeong-Gun Lee

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Kernel concurrency opportunities based on GPU benchmarks characterization

Article 17 January 2019

Pablo Carvalho, Rommel Cruz, … Leandro A. J. Marzulo

References

Barceló J, Codina E, Casas J, Ferrer JL, García D (2005) Microscopic traffic simulation: a tool for the design, analysis and evaluation of intelligent transport systems. J Intell Robot Syst 41:173–203
Article Google Scholar
Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms, 2nd edn. McGraw-Hill Higher Education, Burr Ridge, Il 60521
Crauser A, Mehlhorn K, Meyer U, Sanders P (1998) A parallelization of Dijkstra’s shortest path algorithm. In: Brim L, Gruska J, Zlatuška J (eds) Mathematical foundations of computer science 1998, LNCS, vol 1450. Springer, Berlin, pp 722–731
Chapter Google Scholar
Dasgupta A (2011) CUDA performance analyzer. Ph.D. thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271
Article MathSciNet MATH Google Scholar
Farooqui N, Kerr A, Diamos G, Yalamanchili S, Schwan K (2011) A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of 4th workshop on GPGPU, GPGPU-4, x. ACM, New York, NY, pp 9:1–9:9
Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. InPar 2012:1–10
Google Scholar
Harris M (2008) Optimizing parallel reduction in CUDA. NVIDIA
Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, San Francisco, CA, USA, p 258
Martín P, Torres R, Gavilanes A (2009) CUDA solutions for the SSSP problem. In: Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (eds) Computational science—ICCS 2009, LNCS, vol 5544. Springer, Berlin, pp 904–913
Nobari S, Lu X, Karras P, Bressan S (2011) Fast random graph generation. In: Proceedings of 14th international Conference on EDBT/ICDT ’11. ACM, NY, pp 331–342
Ortega-Arranz H, Torres Y, Llanos DR., Gonzalez-Escribano A (2013) A new GPU-based approach to the shortest path problem. In: High performance computing and simulation (HPCS), 2013 international Conference on, pp 505–512
Rétvári G, Bíró JJ, Cinkler T (2007) On shortest path representation. IEEE ACM Trans Netw 15:1293–1306
Article Google Scholar
Torres Y, González-Escribano A, Llanos DR (2012) uBench: performance impact of CUDA block geometry. In: Techniocal report IT-DI-2012-0001, Universidad de Valladolid
Torres Y, Gonzalez-Escribano A, Llanos DR (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 65:1–14
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
Article Google Scholar

Download references

Acknowledgments

This research has been partially supported by Ministerio de Economía y Competitividad (Spain) and ERDF program of the European Union: CAPAP-H4 network (TIN2011-15734-E), MOGECOPP project (TIN2011-25639); and Junta de Castilla y León (Spain) ATLAS project (VA172A12-2).

Author information

Authors and Affiliations

Dpto. Informática, Universidad de Valladolid, Valladolid, Spain
Hector Ortega-Arranz, Yuri Torres, Arturo Gonzalez-Escribano & Diego R. Llanos

Authors

Hector Ortega-Arranz
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Torres
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Gonzalez-Escribano
View author publications
You can also search for this author in PubMed Google Scholar
Diego R. Llanos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego R. Llanos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A. et al. Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria. J Supercomput 70, 786–798 (2014). https://doi.org/10.1007/s11227-014-1212-z

Download citation

Published: 18 May 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11227-014-1212-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Abstract

Access this article

Similar content being viewed by others

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Kernel concurrency opportunities based on GPU benchmarks characterization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Abstract

Access this article

Similar content being viewed by others

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Kernel concurrency opportunities based on GPU benchmarks characterization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation