Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Costanzo, Manuel; Rucci, Enzo; Costi, Ulises; Chichizola, Franco; Naiouf, Marcelo

doi:10.1007/978-3-030-75836-3_3

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1409))

Included in the following conference series:

Argentine Congress of Computer Science

364 Accesses
1 Altmetric

Abstract

Today, one of the main challenges for high-performance computing systems is to improve their performance by keeping energy consumption at acceptable levels. In this context, a consolidated strategy consists of using accelerators such as GPUs or many-core Intel Xeon Phi processors. In this work, devices of the NVIDIA Pascal and Intel Xeon Phi Knights Landing architectures are described and compared. Selecting the Floyd-Warshall algorithm as a representative case of graph and memory-bound applications, optimized implementations were developed to analyze and compare performance and energy efficiency on both devices. As it was expected, Xeon Phi showed superior when considering double-precision data. However, contrary to what was considered in our preliminary analysis, it was found that the performance and energy efficiency of both devices were comparable using single-precision datatype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Enhanced OpenMP Algorithm to Compute All-Pairs Shortest Path on X86 Architectures

Asynchronous Parallel Dijkstra’s Algorithm on Intel Xeon Phi Processor

Notes

1.
https://www.top500.org/green500/.
2.
Top500 www.top500.org.
3.
If there is no path between nodes i and j, their distance is considered to be infinite (usually represented as the largest positive value).
4.
The characteristics of each platform were described at the end of Sect. 2.1.
5.
Intel Performance Counter Monitor: http://www.intel.com/software/pcm.
6.
NVIDIA System Management Interface: https://developer.nvidia.com/nvidia-system-management-interface.
7.
Naturally, it is also possible to develop an implementation that processes the matrix in parts and does not have this memory limitation. However, the need to run I/O operations for each round would significantly degrade performance.

References

Codreanu, V., Rodríguez, J., Saastad, O.W.: Best practice guide - knights landing (2017). https://bit.ly/2CEolbR
Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M.: Comparación de Arquitecturas HPC para Computar Caminos Mínimos en Grafos. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Actas del XXVI Congreso Argentino de Ciencias de la Computación (CACIC 2020), pp. 82–92 (2020)
Google Scholar
Deng, L., Bai, H., Zhao, D., Wang, F.: Kepler GPU vs. Xeon phi: performance case study with a high-order CFD application. In: 2015 IEEE International Conference on Computer and Communications (ICCC), pp. 87–94 (2015)
Google Scholar
Deveci, M., Trott, C., Rajamanickam, S.: Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78, 33–46 (2018). https://doi.org/10.1016/j.parco.2018.06.009. http://www.sciencedirect.com/science/article/pii/S0167819118301923
Foley, D., Danskin, J.: Ultra-performance pascal GPU and NVLINK interconnect. IEEE Micro 37(2), 7–17 (2017)
Article Google Scholar
Gawande, N.A., Daily, J.A., Siegel, C., Tallent, N.R., Vishnu, A.: Scaling deep learning workloads: Nvidia DGX-1/pascal and intel knights landing. Futur. Gener. Comput. Syst. 108, 1162–1172 (2020)
Article Google Scholar
Giefers, H., Staar, P., Bekas, C., Hagleitner, C.: Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: a comparative study of GPU, Xeon Phi and FPGA. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 46–56 (2016)
Google Scholar
Hashemi, S., Anthony, N., Tann, H., Bahar, R.I., Reda, S.: Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1474–1479 (2017). https://doi.org/10.23919/DATE.2017.7927224
Igual, F.D., García, C., Botella, G., Piñuel, L., Prieto-Matías, M., Tirado, F.: Non-negative matrix factorization on low-power architectures and accelerators. Comput. Electr. Eng. 46(C), 139–156 (2015). https://doi.org/10.1016/j.compeleceng.2015.03.035
Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH 2008, pp. 47–55. Eurographics Association, Aire-la-Ville (2008)
Google Scholar
Lund, B.D., Smith, J.W.: A multi-stage CUDA kernel for Floyd-Warshall. CoRR abs/1001.4108 (2010). http://arxiv.org/abs/1001.4108
Morgan, T.P.: The end of Xeon Phi - It’s Xeon and Maybe GPUs from here (2018). https://www.green500.org/
NVIDIA: NVIDIA Tesla P100. https://bit.ly/2Ozrrk1
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston (2016)
Google Scholar
Robertsén, F., Mattila, K., Westerholm, J.: High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurr. Comput. Pract. Exp. 31(13), e5072 (2019). https://doi.org/10.1002/cpe.5072
Article Google Scholar
Rucci, E., De Giusti, A., Naiouf, M.: Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor: a case study. In: De Giusti, A.E. (ed.) CACIC 2017. CCIS, vol. 790, pp. 47–57. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75214-3_5
Chapter Google Scholar
Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matias, M.: SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst. Biol. 12(5), 96 (2018). https://doi.org/10.1186/s12918-018-0614-6
Article Google Scholar
Sakamoto, R., Kondo, M., Fujita, K., Ichimura, T., Nakajima, K.: The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPCAsia2020, pp. 199–206. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3368474.3368492
Scheidegger, S., Mikushin, D., Kubler, F., Schenk, O.: Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 610–619 (2018)
Google Scholar
Trader, T.: Requiem for a Phi: knights landing discontinued (2018). https://www.hpcwire.com/2018/07/25/end-of-the-road-for-knights-landing-phi
Venkataraman, G., Sahni, S., Mukhopadhyaya, S.: A blocked all-pairs shortest-paths algorithm. SWAT 2000. LNCS, vol. 1851, pp. 419–432. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44985-X_36
Chapter MATH Google Scholar
Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014). https://doi.org/10.1109/FPL.2014.6927483

Download references

Acknowledgments

The authors are grateful for the support of NVIDIA through the donation of the Titan X GPU used in this research.

Author information

Authors and Affiliations

III-LIDI, Facultad de Informática, UNLP – CIC, 1900, La Plata, Bs As, Argentina
Manuel Costanzo, Enzo Rucci, Franco Chichizola & Marcelo Naiouf
Facultad de Informática, UNLP, 1900, La Plata, Bs As, Argentina
Ulises Costi

Authors

Manuel Costanzo
View author publications
You can also search for this author in PubMed Google Scholar
Enzo Rucci
View author publications
You can also search for this author in PubMed Google Scholar
Ulises Costi
View author publications
You can also search for this author in PubMed Google Scholar
Franco Chichizola
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Naiouf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enzo Rucci .

Editor information

Editors and Affiliations

National University of La Plata, La Plata, Argentina
Patricia Pesado
National University of La Matanza, San Justo, Argentina
Jorge Eterovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M. (2021). Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Pesado, P., Eterovic, J. (eds) Computer Science – CACIC 2020. CACIC 2020. Communications in Computer and Information Science, vol 1409. Springer, Cham. https://doi.org/10.1007/978-3-030-75836-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-75836-3_3
Published: 05 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75835-6
Online ISBN: 978-3-030-75836-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics