Abstract
Today, one of the main challenges for high-performance computing systems is to improve their performance by keeping energy consumption at acceptable levels. In this context, a consolidated strategy consists of using accelerators such as GPUs or many-core Intel Xeon Phi processors. In this work, devices of the NVIDIA Pascal and Intel Xeon Phi Knights Landing architectures are described and compared. Selecting the Floyd-Warshall algorithm as a representative case of graph and memory-bound applications, optimized implementations were developed to analyze and compare performance and energy efficiency on both devices. As it was expected, Xeon Phi showed superior when considering double-precision data. However, contrary to what was considered in our preliminary analysis, it was found that the performance and energy efficiency of both devices were comparable using single-precision datatype.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Top500 www.top500.org.
- 3.
If there is no path between nodes i and j, their distance is considered to be infinite (usually represented as the largest positive value).
- 4.
The characteristics of each platform were described at the end of Sect. 2.1.
- 5.
Intel Performance Counter Monitor: http://www.intel.com/software/pcm.
- 6.
NVIDIA System Management Interface: https://developer.nvidia.com/nvidia-system-management-interface.
- 7.
Naturally, it is also possible to develop an implementation that processes the matrix in parts and does not have this memory limitation. However, the need to run I/O operations for each round would significantly degrade performance.
References
Codreanu, V., Rodríguez, J., Saastad, O.W.: Best practice guide - knights landing (2017). https://bit.ly/2CEolbR
Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M.: Comparación de Arquitecturas HPC para Computar Caminos Mínimos en Grafos. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Actas del XXVI Congreso Argentino de Ciencias de la Computación (CACIC 2020), pp. 82–92 (2020)
Deng, L., Bai, H., Zhao, D., Wang, F.: Kepler GPU vs. Xeon phi: performance case study with a high-order CFD application. In: 2015 IEEE International Conference on Computer and Communications (ICCC), pp. 87–94 (2015)
Deveci, M., Trott, C., Rajamanickam, S.: Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78, 33–46 (2018). https://doi.org/10.1016/j.parco.2018.06.009. http://www.sciencedirect.com/science/article/pii/S0167819118301923
Foley, D., Danskin, J.: Ultra-performance pascal GPU and NVLINK interconnect. IEEE Micro 37(2), 7–17 (2017)
Gawande, N.A., Daily, J.A., Siegel, C., Tallent, N.R., Vishnu, A.: Scaling deep learning workloads: Nvidia DGX-1/pascal and intel knights landing. Futur. Gener. Comput. Syst. 108, 1162–1172 (2020)
Giefers, H., Staar, P., Bekas, C., Hagleitner, C.: Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: a comparative study of GPU, Xeon Phi and FPGA. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 46–56 (2016)
Hashemi, S., Anthony, N., Tann, H., Bahar, R.I., Reda, S.: Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1474–1479 (2017). https://doi.org/10.23919/DATE.2017.7927224
Igual, F.D., García, C., Botella, G., Piñuel, L., Prieto-Matías, M., Tirado, F.: Non-negative matrix factorization on low-power architectures and accelerators. Comput. Electr. Eng. 46(C), 139–156 (2015). https://doi.org/10.1016/j.compeleceng.2015.03.035
Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH 2008, pp. 47–55. Eurographics Association, Aire-la-Ville (2008)
Lund, B.D., Smith, J.W.: A multi-stage CUDA kernel for Floyd-Warshall. CoRR abs/1001.4108 (2010). http://arxiv.org/abs/1001.4108
Morgan, T.P.: The end of Xeon Phi - It’s Xeon and Maybe GPUs from here (2018). https://www.green500.org/
NVIDIA: NVIDIA Tesla P100. https://bit.ly/2Ozrrk1
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston (2016)
Robertsén, F., Mattila, K., Westerholm, J.: High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurr. Comput. Pract. Exp. 31(13), e5072 (2019). https://doi.org/10.1002/cpe.5072
Rucci, E., De Giusti, A., Naiouf, M.: Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor: a case study. In: De Giusti, A.E. (ed.) CACIC 2017. CCIS, vol. 790, pp. 47–57. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75214-3_5
Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matias, M.: SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst. Biol. 12(5), 96 (2018). https://doi.org/10.1186/s12918-018-0614-6
Sakamoto, R., Kondo, M., Fujita, K., Ichimura, T., Nakajima, K.: The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPCAsia2020, pp. 199–206. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3368474.3368492
Scheidegger, S., Mikushin, D., Kubler, F., Schenk, O.: Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 610–619 (2018)
Trader, T.: Requiem for a Phi: knights landing discontinued (2018). https://www.hpcwire.com/2018/07/25/end-of-the-road-for-knights-landing-phi
Venkataraman, G., Sahni, S., Mukhopadhyaya, S.: A blocked all-pairs shortest-paths algorithm. SWAT 2000. LNCS, vol. 1851, pp. 419–432. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44985-X_36
Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014). https://doi.org/10.1109/FPL.2014.6927483
Acknowledgments
The authors are grateful for the support of NVIDIA through the donation of the Titan X GPU used in this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M. (2021). Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Pesado, P., Eterovic, J. (eds) Computer Science – CACIC 2020. CACIC 2020. Communications in Computer and Information Science, vol 1409. Springer, Cham. https://doi.org/10.1007/978-3-030-75836-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-75836-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75835-6
Online ISBN: 978-3-030-75836-3
eBook Packages: Computer ScienceComputer Science (R0)