Abstract
Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Green500 Supercomputer Ranking. https://www.green500.org/
Top500 Supercomputer Ranking. https://www.top500.org/
Barnes, T., et al.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, Piscataway, NJ, USA, pp. 43–53. IEEE Press (2016)
Bondhugula, U., Devulapalli, A., Dinan, J., Fernando, J., Wyckoff, P., Stahlberg, E., Sadayappan, P.: Hardware/software integration for FPGA-based all-pairs shortest-paths. In: 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 152–164, April 2006
Codreanu, V., Rodrguez, J., Saastad, O.W.: Best Practice Guide - Knights Landing (2017). http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-Knights-Landing.pdf
Culler, D.E., Gupta, A., Singh, J.P.: Parallel Computer Architecture: A Hardware/Software Approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
Giles, M.B., Reguly, I.: Trends in high-performance computing for engineering calculations. Philos. Trans. R. Soc. Lond. Math. Phys. Eng. Sci. 372(2022), 1–14 (2014)
Haidar, A., Tomov, S., Arturov, K., Guney, M., Story, S., Dongarra, J.: LU, QR, and Cholesky factorizations: programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7, September 2016
Han, S., Kang, S.: Optimizing all-pairs shortest-path algorithm using vector instructions (2005)
Hou, K., Wang, H., Feng, W.: Delivering parallel programmability to the masses via the Intel MIC ecosystem: a case study. In: 2014 43rd International Conference on Parallel Processing Workshops, pp. 273–282, September 2014
Jalali, S., Noroozi, M.: Determination of the optimal escape routes of underground mine networks in emergency cases. Saf. Sci. 47(8), 1077–1082 (2009)
Katz, G.J., Kider Jr., J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH 2008, pp. 47–55. Eurographics Association, Aire-la-Ville (2008)
Khan, P., Konar, G., Chakraborty, N.: Modification of Floyd-Warshall’s algorithm for shortest path routing in wireless sensor networks. In: 2014 Annual IEEE India Conference (INDICON), pp. 1–6, December 2014
Matsumoto, K., Nakasato, N., Sedukhin, S.G.: Blocked all-pairs shortest paths algorithm for hybrid cpu-gpu system. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp. 145–152, September 2011
Nakaya, A., Goto, S., Kanehisa, M.: Extraction of correlated gene clusters by multiple graph comparison. Genome Inform. 12, 44–53 (2001)
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights Landing Edition. Morgan Kaufmann Publishers Inc., Boston (2016)
Rosales, C., Cazes, J., Milfeld, K., Gómez-Iglesias, A., Koesterke, L., Huang, L., Vienne, J.: A comparative study of application performance and scalability on the intel knights landing processor. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 307–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46079-6_22
Venkataraman, G., Sahni, S., Mukhopadhyaya, S.: A blocked all-pairs shortest-paths algorithm. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 419–432. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44985-X_36
Warshall, S.: A theorem on boolean matrices. J. ACM 9(1), 11–12 (1962)
Acknowledgments
The authors thank the ArTeCS Group from Universidad Complutense de Madrid for letting use their Xeon Phi KNL system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Rucci, E., De Giusti, A., Naiouf, M. (2018). Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study. In: De Giusti, A. (eds) Computer Science – CACIC 2017. CACIC 2017. Communications in Computer and Information Science, vol 790. Springer, Cham. https://doi.org/10.1007/978-3-319-75214-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-75214-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75213-6
Online ISBN: 978-3-319-75214-3
eBook Packages: Computer ScienceComputer Science (R0)