Skip to main content
Log in

Predicting HPC parallel program performance based on LLVM compiler

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Performance prediction of parallel program plays key roles in many areas, such as parallel system design, parallel program optimization, and parallel system procurement. Accurate and efficient performance prediction on large-scale parallel systems is a challenging problem. To solve this problem, we present an effective framework for performance prediction based on the LLVM compiler technique in this paper. We can predict the performance of a parallel program on a small amount of nodes of the target parallel system using this framework toned but not execute this parallel program on a corresponding full-scale parallel system. This framework predicts the performance of computation and communication components separately and combines the two predictions to achieve full program prediction. As for sequential computation, we first combine the static branch probability and loop trip count identification and propose a new instrumentation method to acquire the number of each instruction type. We then construct a test program to measure the average execution time of each instruction type. Finally, we utilize the pruning technique to convert a parallel program into a corresponding sequential program to predict the performance on only one node of the target parallel system. As for communication, we utilize the LogGP model to model point-to-point communication and the artificial neural network technique to model collective communication. We validate our approach by a set of experiments that predict the performance of NAS parallel benchmarks and CGPOP parallel application. Experimental results show that the proposed framework can accurately predict the execution time of parallel programs, and the average error rate of these programs is 10.86%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Top 500 supercomputer site. http://www.top500.org

  2. Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. ACM (2001)

  3. Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 81–89. ACM (2006)

  4. Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Supercomputing, ACM/IEEE 2002 Conference. IEEE (2002)

  5. Zheng, G., Kakulapati, G., Kalé, L.V.: Bigsim: a parallel simulator for performance prediction of extremely large parallel machines. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium. IEEE (2004)

  6. Zhang, W., Cheng, A.M., Subhlok, J.: DwarfCode: a performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016)

    Article  MathSciNet  Google Scholar 

  7. Message passing interface forum. http://www.mpiforum.org/

  8. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, pp. 75–86. CGO IEEE (2004)

  9. Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 95–105. ACM (1995)

  10. Hoefler, T., Mehlan, T., Lumsdaine, A., Rehm, W.: Netgauge: a network performance measurement framework. In: International Conference on High Performance Computing and Communications, pp. 659–671. Springer, Berlin (2007)

  11. SKaMPI project. http://liinwww.ira.uka.de/~skampi/

  12. Wu, Y., Larus, J.R.: Static branch frequency and program profile analysis. In: Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 1–11. ACM (1994)

  13. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, pp. 670–671. Addison wesley, Boston (1986)

    Google Scholar 

  14. Hoefler, T., Lichei, A., Rehm, W.: Low-overhead LogGP parameter assessment for modern interconnection networks. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)

  15. Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127–143 (2007)

    Article  Google Scholar 

  16. OpenMPI project. https://www.open-mpi.org/

  17. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Simon, H.D.: The NAS parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM (1991)

  18. Stone, A., Dennis, J.M., & Strout, M.M.: The CGPOP miniapp, version 1.0. Colorado State University, Technical Report CS-11-103 (2011)

  19. Velho, P., Legrand, A.: Accuracy study and improvement of network simulation in the simgrid framework. In: Proceedings of the second International Conference on Simulation Tools and Techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2009)

  20. Malyshkin, V.E.: Peculiarities of numerical algorithms parallel implementation for exa-flops multicomputers. Int. J. Big Data Intell. 1(1–2), 65–73 (2014)

    Article  Google Scholar 

  21. Viswanathan, V.: Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int. J. Big Data Intell. 3(3), 176–181 (2016)

    Article  Google Scholar 

  22. Wu, C.C., Ke, J.Y., Lin, H., Jhan, S.S.: Adjusting thread parallelism dynamically to accelerate dynamic programming with irregular workload distribution on GPGPUs. Int. J. Grid High Perform. Comput. (IJGHPC) 6(1), 1–20 (2014)

    Article  Google Scholar 

  23. Barker, K.J., Pakin, S., Kerbyson, D.J.: A performance model of the krak hydrodynamics application. In: 2006 International Conference on Parallel Processing (ICPP’06), pp. 245–254. IEEE (2006)

  24. Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. Appl. 14(4), 330–346 (2000)

    Article  Google Scholar 

  25. Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)

  26. Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A.: Compile-time based performance prediction. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 365–379. Springer, Berlin (1999)

  27. Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In: ACM Sigplan Notices, vol. 45, no. 5, pp. 305–314. ACM (2010)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB0800801, the National Science Foundation of China (NSFC) under Grant Nos. 61672186, 61472108, and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20132302110037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weizhe Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Hao, M. & Snir, M. Predicting HPC parallel program performance based on LLVM compiler. Cluster Comput 20, 1179–1192 (2017). https://doi.org/10.1007/s10586-016-0707-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0707-1

Keywords

Navigation