Abstract
Nowadays, ARM processors are widely used in various HPC applications. With ARM popularity rapidly increasing, there is still a significant lack of detailed performance evaluation of such systems on various workloads. Unlike other existing approaches to the performance evaluation, this paper covers the methodology of creating a full and comprehensive benchmarking set, which allows us to present a detailed performance comparison of Kunpeng 920–6426 and Intel Xeon 6140 processors. The developed benchmarks are based on relatively simple fragments of code, frequently used in many scientific and real-world applications. For each benchmark we provide a detailed scalability and performance analysis, based on the top-down and roofline performance models, which allow to identify bottlenecks and implementation efficiency for each benchmark. The evaluation results demonstrate that Kunpeng 920 outperform Intel Xeon 6140 processors on various cache-bound and memory-bound applications, such as stencil kernels, operations with dense matrices and vectors. At the same time, Kunpeng 920 demonstrate lower performance on compute-bound problems which can be vectorised or problems, involving indirect memory accesses, such as graph algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McVoy, L.W., Staelin, C., et al.: Lmbench: portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294, San Diego, CA, USA (1996)
Lo, Y.J., et al.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_7
Roten, D., Olsen, K., Day, S., Cui, Y., Fäh, D.: Expected seismic shaking in los angeles reduced by san andreas fault zone plasticity. Geophys. Res. Lett. 41(8), 2769–2777 (2014)
Rudyak, V.Y., Emelyanenko, A.V., Loiko, V.A.: Structure transitions in oblate nematic droplets. Phys. Rev. E 88(5), 05250 (2013)
McCalpin, J.D.: Stream benchmark, vol. 22 (1995). http://www.cs.virginia.edu/stream/ref.html# what
Luszczek, P.R., et al.: The hpc challenge (hpcc) benchmark suite. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, vol. 213, pp. 1188455–1188677. Citeseer (2006)
Marjanović, V., Gracia, J., Glass, C.W.: Performance modeling of the HPCG benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 172–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_9
Wang, Y.-C., et al.: An empirical study of hpc workloads on huawei kunpeng 916 processor, pp. 360–367 (2019)
Komatsu, K., et al.: Performance evaluation of a vector supercomputer sx-aurora tsubasa. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 685–696. IEEE (2018)
Alappat, C.L., Hofmann, J., Hager, G., Fehske, H., Bishop, A.R., Wellein, G.: Understanding HPC benchmark performance on intel Broadwell and cascade lake processors. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 412–433. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_21
Jackson, A., Turner, A., Weiland, M., Johnson, N., Perks, O., Parsons, M.: Evaluating the arm ecosystem for high performance computing. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11 (2019)
De Melo, A.C.: The new linux’perf’tools. Slides Linux Kongr. 18, 1–42 (2010)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Afanasyev, I.V., Voevodin, V.V., Komatsu, K., Kobayashi, H.: Vgl: a high-performance graph processing framework for the nec sx-aurora tsubasa vector architecture. J. Supercomput. 1–22 (2021)
Afanasyev, I.V.: Developing an architecture-independent graph framework for modern vector processors and nvidia gpus. Supercomput. Front. Innov. 7(4), 49–61 (2021)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)
Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_24
Acknowledgments
The reported study presented in Sects. 5.7 and 5.8 concerning evaluating the performance of VGL framework was is supported by Russian Ministry of Science and Higher Education, agreement No. 075-15-2019-1621. The work presented in all sections except 5.7 and 5.8 was supported by Huawei Technologies Co., Ltd. (Project No. OAA20100800391587A).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Afanasyev, I., Lichmanov, D. (2021). Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-86359-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86358-6
Online ISBN: 978-3-030-86359-3
eBook Packages: Computer ScienceComputer Science (R0)