Abstract
Performance prediction of parallel program plays key roles in many areas, such as parallel system design, parallel program optimization, and parallel system procurement. Accurate and efficient performance prediction on large-scale parallel systems is a challenging problem. To solve this problem, we present an effective framework for performance prediction based on the LLVM compiler technique in this paper. We can predict the performance of a parallel program on a small amount of nodes of the target parallel system using this framework toned but not execute this parallel program on a corresponding full-scale parallel system. This framework predicts the performance of computation and communication components separately and combines the two predictions to achieve full program prediction. As for sequential computation, we first combine the static branch probability and loop trip count identification and propose a new instrumentation method to acquire the number of each instruction type. We then construct a test program to measure the average execution time of each instruction type. Finally, we utilize the pruning technique to convert a parallel program into a corresponding sequential program to predict the performance on only one node of the target parallel system. As for communication, we utilize the LogGP model to model point-to-point communication and the artificial neural network technique to model collective communication. We validate our approach by a set of experiments that predict the performance of NAS parallel benchmarks and CGPOP parallel application. Experimental results show that the proposed framework can accurately predict the execution time of parallel programs, and the average error rate of these programs is 10.86%.
Similar content being viewed by others
References
Top 500 supercomputer site. http://www.top500.org
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. ACM (2001)
Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 81–89. ACM (2006)
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Supercomputing, ACM/IEEE 2002 Conference. IEEE (2002)
Zheng, G., Kakulapati, G., Kalé, L.V.: Bigsim: a parallel simulator for performance prediction of extremely large parallel machines. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium. IEEE (2004)
Zhang, W., Cheng, A.M., Subhlok, J.: DwarfCode: a performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016)
Message passing interface forum. http://www.mpiforum.org/
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, pp. 75–86. CGO IEEE (2004)
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 95–105. ACM (1995)
Hoefler, T., Mehlan, T., Lumsdaine, A., Rehm, W.: Netgauge: a network performance measurement framework. In: International Conference on High Performance Computing and Communications, pp. 659–671. Springer, Berlin (2007)
SKaMPI project. http://liinwww.ira.uka.de/~skampi/
Wu, Y., Larus, J.R.: Static branch frequency and program profile analysis. In: Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 1–11. ACM (1994)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, pp. 670–671. Addison wesley, Boston (1986)
Hoefler, T., Lichei, A., Rehm, W.: Low-overhead LogGP parameter assessment for modern interconnection networks. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127–143 (2007)
OpenMPI project. https://www.open-mpi.org/
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Simon, H.D.: The NAS parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM (1991)
Stone, A., Dennis, J.M., & Strout, M.M.: The CGPOP miniapp, version 1.0. Colorado State University, Technical Report CS-11-103 (2011)
Velho, P., Legrand, A.: Accuracy study and improvement of network simulation in the simgrid framework. In: Proceedings of the second International Conference on Simulation Tools and Techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2009)
Malyshkin, V.E.: Peculiarities of numerical algorithms parallel implementation for exa-flops multicomputers. Int. J. Big Data Intell. 1(1–2), 65–73 (2014)
Viswanathan, V.: Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int. J. Big Data Intell. 3(3), 176–181 (2016)
Wu, C.C., Ke, J.Y., Lin, H., Jhan, S.S.: Adjusting thread parallelism dynamically to accelerate dynamic programming with irregular workload distribution on GPGPUs. Int. J. Grid High Perform. Comput. (IJGHPC) 6(1), 1–20 (2014)
Barker, K.J., Pakin, S., Kerbyson, D.J.: A performance model of the krak hydrodynamics application. In: 2006 International Conference on Parallel Processing (ICPP’06), pp. 245–254. IEEE (2006)
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. Appl. 14(4), 330–346 (2000)
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)
Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A.: Compile-time based performance prediction. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 365–379. Springer, Berlin (1999)
Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In: ACM Sigplan Notices, vol. 45, no. 5, pp. 305–314. ACM (2010)
Acknowledgements
This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB0800801, the National Science Foundation of China (NSFC) under Grant Nos. 61672186, 61472108, and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20132302110037.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Hao, M. & Snir, M. Predicting HPC parallel program performance based on LLVM compiler. Cluster Comput 20, 1179–1192 (2017). https://doi.org/10.1007/s10586-016-0707-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0707-1