Predicting HPC parallel program performance based on LLVM compiler

Zhang, Weizhe; Hao, Meng; Snir, Marc

doi:10.1007/s10586-016-0707-1

Predicting HPC parallel program performance based on LLVM compiler

Published: 27 December 2016

Volume 20, pages 1179–1192, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

874 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Performance prediction of parallel program plays key roles in many areas, such as parallel system design, parallel program optimization, and parallel system procurement. Accurate and efficient performance prediction on large-scale parallel systems is a challenging problem. To solve this problem, we present an effective framework for performance prediction based on the LLVM compiler technique in this paper. We can predict the performance of a parallel program on a small amount of nodes of the target parallel system using this framework toned but not execute this parallel program on a corresponding full-scale parallel system. This framework predicts the performance of computation and communication components separately and combines the two predictions to achieve full program prediction. As for sequential computation, we first combine the static branch probability and loop trip count identification and propose a new instrumentation method to acquire the number of each instruction type. We then construct a test program to measure the average execution time of each instruction type. Finally, we utilize the pruning technique to convert a parallel program into a corresponding sequential program to predict the performance on only one node of the target parallel system. As for communication, we utilize the LogGP model to model point-to-point communication and the artificial neural network technique to model collective communication. We validate our approach by a set of experiments that predict the performance of NAS parallel benchmarks and CGPOP parallel application. Experimental results show that the proposed framework can accurately predict the execution time of parallel programs, and the average error rate of these programs is 10.86%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Article Open access 19 January 2019

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

References

Top 500 supercomputer site. http://www.top500.org
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. ACM (2001)
Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 81–89. ACM (2006)
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Supercomputing, ACM/IEEE 2002 Conference. IEEE (2002)
Zheng, G., Kakulapati, G., Kalé, L.V.: Bigsim: a parallel simulator for performance prediction of extremely large parallel machines. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium. IEEE (2004)
Zhang, W., Cheng, A.M., Subhlok, J.: DwarfCode: a performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016)
Article MathSciNet Google Scholar
Message passing interface forum. http://www.mpiforum.org/
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, pp. 75–86. CGO IEEE (2004)
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 95–105. ACM (1995)
Hoefler, T., Mehlan, T., Lumsdaine, A., Rehm, W.: Netgauge: a network performance measurement framework. In: International Conference on High Performance Computing and Communications, pp. 659–671. Springer, Berlin (2007)
SKaMPI project. http://liinwww.ira.uka.de/~skampi/
Wu, Y., Larus, J.R.: Static branch frequency and program profile analysis. In: Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 1–11. ACM (1994)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, pp. 670–671. Addison wesley, Boston (1986)
Google Scholar
Hoefler, T., Lichei, A., Rehm, W.: Low-overhead LogGP parameter assessment for modern interconnection networks. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127–143 (2007)
Article Google Scholar
OpenMPI project. https://www.open-mpi.org/
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Simon, H.D.: The NAS parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM (1991)
Stone, A., Dennis, J.M., & Strout, M.M.: The CGPOP miniapp, version 1.0. Colorado State University, Technical Report CS-11-103 (2011)
Velho, P., Legrand, A.: Accuracy study and improvement of network simulation in the simgrid framework. In: Proceedings of the second International Conference on Simulation Tools and Techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2009)
Malyshkin, V.E.: Peculiarities of numerical algorithms parallel implementation for exa-flops multicomputers. Int. J. Big Data Intell. 1(1–2), 65–73 (2014)
Article Google Scholar
Viswanathan, V.: Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int. J. Big Data Intell. 3(3), 176–181 (2016)
Article Google Scholar
Wu, C.C., Ke, J.Y., Lin, H., Jhan, S.S.: Adjusting thread parallelism dynamically to accelerate dynamic programming with irregular workload distribution on GPGPUs. Int. J. Grid High Perform. Comput. (IJGHPC) 6(1), 1–20 (2014)
Article Google Scholar
Barker, K.J., Pakin, S., Kerbyson, D.J.: A performance model of the krak hydrodynamics application. In: 2006 International Conference on Parallel Processing (ICPP’06), pp. 245–254. IEEE (2006)
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. Appl. 14(4), 330–346 (2000)
Article Google Scholar
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)
Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A.: Compile-time based performance prediction. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 365–379. Springer, Berlin (1999)
Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In: ACM Sigplan Notices, vol. 45, no. 5, pp. 305–314. ACM (2010)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB0800801, the National Science Foundation of China (NSFC) under Grant Nos. 61672186, 61472108, and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20132302110037.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Weizhe Zhang & Meng Hao
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
Marc Snir

Authors

Weizhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Marc Snir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weizhe Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Hao, M. & Snir, M. Predicting HPC parallel program performance based on LLVM compiler. Cluster Comput 20, 1179–1192 (2017). https://doi.org/10.1007/s10586-016-0707-1

Download citation

Received: 22 September 2016
Revised: 21 November 2016
Accepted: 02 December 2016
Published: 27 December 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10586-016-0707-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting HPC parallel program performance based on LLVM compiler

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Can GPU performance increase faster than the code error rate?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting HPC parallel program performance based on LLVM compiler

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Can GPU performance increase faster than the code error rate?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation