Abstract
This paper is devoted to the performance evaluation of a hybrid computer cluster built on IBM POWER8 CPUs and NVIDIA Tesla P100 GPUs. The architecture of the computing system and software used are described. Results of experiments carried out using the STREAM, NPB, Crossroads/NERSC-9 DGEMM, and HPL packages are discussed. The efficiency of the simultaneous multithreading (SMT) technology supported by POWER8 processors, as well as the performance of some compilers, parallel programming and mathematical libraries, on this architecture is analyzed.
Similar content being viewed by others
REFERENCES
Shan, A., Heterogeneous processing: A strategy for augmenting Moore’s law, 2006. https://www.linuxjournal.com/article/8368.
TOP500 supercomputer sites, 2019. https://www.top500.org.
Karkhanis, T.S. and Moreira, J.E., IBM Power architecture, Encyclopedia of Parallel Computing, Padua, D., Ed., Boston: Springer, 2011.
Sinharoy, B., Van Norstrand, J.A., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Kar-khanis, T., and Fernsler, K.M., IBM POWER8 processor core microarchitecture, IBM J. Res. Dev., 2015, vol. 59, no. 1, pp. 2:1–2:21.
Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L., and Tullsen, D.M., Simultaneous multithreading: A platform for next-generation processors, IEEE Micro, 1997, vol. 17, no. 5, pp. 12–19.
Starke, W.J., Stuecheli, J., Daly, D.M., Dodson, J.S., Auernhammer, F., Sagmeister, P.M., Guthrie, G.L., Marino, C.F., Siegel, M., and Blaner, B., The cache and memory subsystems of the IBM POWER8 processor, IBM J. Res. Dev., 2015, vol. 59, no. 1, pp. 3:1–3:13.
NVIDIA Tesla P100: The most advanced datacenter accelerator ever built. Featuring Pascal GP100, the world’s fastest GPU, Whitepaper, 2016.
Multi-process service, NVIDIA, 2015.
McCalpin, J.D., Memory bandwidth and machine balance in current high performance computers, IEEE Comput. Soc. Techn. Comm. Comput. Archit. (TCCA)Newsl., 1995.
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., and Weeratunga, S., The NAS parallel benchmarks, RNR technical report 94-007, 1994.
Saini, S., Chang, J., Hood, R., and Jin, H., A scalability study of Columbia using the NAS parallel benchmarks, Comput. Methods Sci. Technol., 2006, no. 1, pp. 33–45.
Dongarra, J.J., Luszczek, P., and Petite, A., The LINPACK benchmark: Past, present and future, Concurrency Comput.: Pract. Exper., 2003, vol. 15, no. 9, pp. 803–820.
ESSL guide and reference, IBM, 2016.
Austin, B. and Wright, N.J., Measurement and interpretation of micro-benchmark and application energy use on the Cray XC30, Proc. Energy Efficient Supercomputing Workshop, 2014, pp. 51–59.
Sorokin, A.A., Makogonov, S.I., and Korolev, S.P., The information infrastructure for collective scientific work in the Far East of Russia, Sci. Tech. Inf. Process., 2017, vol. 4, pp. 302–304.
ACKNOWLEDGMENTS
Numerical computations were carried out on the equipment provided by the Data Center of the Far Eastern Branch of the Russian Academy of Sciences (Khabarovsk) [15] and the Federal Research Center Computer Science and Control of the Russian Academy of Sciences (Moscow).
Funding
This work was supported by the Russian Foundation for Basic Research, project no. 18-29-03196.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by Yu. Kornienko
Rights and permissions
About this article
Cite this article
Mal’kovskii, S.I., Sorokin, A.A., Korolev, S.P. et al. Performance Evaluation of a Hybrid Computer Cluster Built on IBM POWER8 Microprocessors. Program Comput Soft 45, 324–332 (2019). https://doi.org/10.1134/S0361768819060057
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768819060057