Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

Afanasyev, Ilya; Lichmanov, Dmitry

doi:10.1007/978-3-030-86359-3_23

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12942))

Included in the following conference series:

International Conference on Parallel Computing Technologies

1183 Accesses
2 Citations

Abstract

Nowadays, ARM processors are widely used in various HPC applications. With ARM popularity rapidly increasing, there is still a significant lack of detailed performance evaluation of such systems on various workloads. Unlike other existing approaches to the performance evaluation, this paper covers the methodology of creating a full and comprehensive benchmarking set, which allows us to present a detailed performance comparison of Kunpeng 920–6426 and Intel Xeon 6140 processors. The developed benchmarks are based on relatively simple fragments of code, frequently used in many scientific and real-world applications. For each benchmark we provide a detailed scalability and performance analysis, based on the top-down and roofline performance models, which allow to identify bottlenecks and implementation efficiency for each benchmark. The evaluation results demonstrate that Kunpeng 920 outperform Intel Xeon 6140 processors on various cache-bound and memory-bound applications, such as stencil kernels, operations with dense matrices and vectors. At the same time, Kunpeng 920 demonstrate lower performance on compute-bound problems which can be vectorised or problems, involving indirect memory accesses, such as graph algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McVoy, L.W., Staelin, C., et al.: Lmbench: portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294, San Diego, CA, USA (1996)
Google Scholar
Lo, Y.J., et al.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_7
Chapter Google Scholar
Roten, D., Olsen, K., Day, S., Cui, Y., Fäh, D.: Expected seismic shaking in los angeles reduced by san andreas fault zone plasticity. Geophys. Res. Lett. 41(8), 2769–2777 (2014)
Article Google Scholar
Rudyak, V.Y., Emelyanenko, A.V., Loiko, V.A.: Structure transitions in oblate nematic droplets. Phys. Rev. E 88(5), 05250 (2013)
Google Scholar
McCalpin, J.D.: Stream benchmark, vol. 22 (1995). http://www.cs.virginia.edu/stream/ref.html# what
Luszczek, P.R., et al.: The hpc challenge (hpcc) benchmark suite. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, vol. 213, pp. 1188455–1188677. Citeseer (2006)
Google Scholar
Marjanović, V., Gracia, J., Glass, C.W.: Performance modeling of the HPCG benchmark. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 172–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_9
Chapter Google Scholar
Wang, Y.-C., et al.: An empirical study of hpc workloads on huawei kunpeng 916 processor, pp. 360–367 (2019)
Google Scholar
Komatsu, K., et al.: Performance evaluation of a vector supercomputer sx-aurora tsubasa. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 685–696. IEEE (2018)
Google Scholar
Alappat, C.L., Hofmann, J., Hager, G., Fehske, H., Bishop, A.R., Wellein, G.: Understanding HPC benchmark performance on intel Broadwell and cascade lake processors. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 412–433. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_21
Chapter Google Scholar
Jackson, A., Turner, A., Weiland, M., Johnson, N., Perks, O., Parsons, M.: Evaluating the arm ecosystem for high performance computing. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11 (2019)
Google Scholar
De Melo, A.C.: The new linux’perf’tools. Slides Linux Kongr. 18, 1–42 (2010)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Afanasyev, I.V., Voevodin, V.V., Komatsu, K., Kobayashi, H.: Vgl: a high-performance graph processing framework for the nec sx-aurora tsubasa vector architecture. J. Supercomput. 1–22 (2021)
Google Scholar
Afanasyev, I.V.: Developing an architecture-independent graph framework for modern vector processors and nvidia gpus. Supercomput. Front. Innov. 7(4), 49–61 (2021)
Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)
Google Scholar
Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_24
Chapter Google Scholar

Download references

Acknowledgments

The reported study presented in Sects. 5.7 and 5.8 concerning evaluating the performance of VGL framework was is supported by Russian Ministry of Science and Higher Education, agreement No. 075-15-2019-1621. The work presented in all sections except 5.7 and 5.8 was supported by Huawei Technologies Co., Ltd. (Project No. OAA20100800391587A).

Author information

Authors and Affiliations

Moscow Center of Fundamental and Applied Mathematics, Moscow, 119991, Russia
Ilya Afanasyev & Dmitry Lichmanov

Authors

Ilya Afanasyev
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Lichmanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilya Afanasyev .

Editor information

Editors and Affiliations

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Novosibirsk, Russia
Victor Malyshkin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afanasyev, I., Lichmanov, D. (2021). Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-86359-3_23
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86358-6
Online ISBN: 978-3-030-86359-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics