Predictive communication modeling for HPC applications

Papadopoulou, Nikela; Goumas, Georgios; Koziris, Nectarios

doi:10.1007/s10586-017-0821-8

Predictive communication modeling for HPC applications

Published: 24 March 2017

Volume 20, pages 2725–2747, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Nikela Papadopoulou ORCID: orcid.org/0000-0003-2141-5654¹,
Georgios Goumas¹ &
Nectarios Koziris¹

458 Accesses
1 Citation
5 Altmetric
Explore all metrics

Abstract

In this paper, we present a methodology for predictive modeling of communication of HPC applications. Communication time depends on a complex set of parameters, relevant to the application, the system architecture, the runtime configuration and runtime conditions. To handle this complexity, we define features that can be extracted from the application, the process mapping and the allocation shape ahead of execution, deploy a single benchmark to sweep over the parameter space and develop predictive models for communication time on two supercomputers, Vilje and Piz Daint, using different subsets of our features, machine-learning methods and training sets. We compare the predictive power of our models on two common communication patterns and one application, for various problem sizes, executions and runtime configurations, ranging from a few dozen to a few thousand cores. Our methodology is successful across all tested communication patterns on both systems and exhibits high prediction accuracy and goodness-of-fit, scoring 23.98% in MMRE, 0.942 in RCC and 61.43% in \(Pred_{0.25}\) on Vilje and 21.31%, 0.940 and 66.57% respectively on Piz Daint, with models that are applicable just-in-time ahead of the execution of an HPC application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Article Open access 28 June 2023

Notes

References

Alawneh, L., Hamou-Lhadj, A., Hassine, J.: Segmenting large traces of inter-process communication with a focus on high performance computing systems. J. Syst. Softw. 120, 1–16 (2016)
Article Google Scholar
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, pp. 95–105 (1995)
Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray xc series network. Cray Inc, White Paper WP-Aries01-1112 (2012)
Bauer, G., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012, IEEE, pp. 652–659 (2012)
Bédaride, P., Degomme, A., Genaud, S., Legrand, A., Markomanolis, G., Quinson, M., Stillwell, M.L., Suter, F., Videau, B., et al.: Toward better simulation of MPI applications on Ethernet/TCP networks. In: PMBS13-4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (2013)
Bekas, C., Curioni, A.: A new energy aware performance metric. Comput. Sci. Res. Dev. 25(3–4), 187–195 (2010)
Article Google Scholar
Bhatelé, A., Kalé, L.V.: Quantifying network contention on large parallel machines. Parallel Process. Lett. 19(04), 553–572 (2009)
Article MathSciNet Google Scholar
Bhatele, A., Mohror, K., Langer, S.H., Isaacs, K.E.: There goes the neighborhood: performance degradation due to nearby jobs. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 41 (2013)
Bhatele, A., Titus, A.R., Thiagarajan, J.J., Jain, N., Gamblin, T., Bremer, P.T., Schulz, M., Kale, L.V.: Identifying the culprits behind network congestion. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 113–122 (2015)
Böhme, D., Geimer, M., Arnold, L., Voigtlaender, F., Wolf, F.: Identifying the root causes of wait states in large-scale parallel applications. ACM Trans. Parallel Comput. (TOPC) 3(2), 11 (2016)
Google Scholar
Casanova, H., Desprez, F., Markomanolis, G.S., Suter, F.: Simulation of MPI applications with time-independent traces. Concurr. Comput.: Pract. Exp. 27(5), 1145–1168 (2015)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Chen, W., Zhai, J., Zhang, J., Zheng, W.: LogGPO: an accurate communication model for performance prediction of MPI programs. Sci. China Ser. F: Inf. Sci. 52(10), 1785–1791 (2009)
Article MATH Google Scholar
Conte, S.D., Dunsmore, H.E., Shen, V.Y.: Software Engineering Metrics and Models. Benjamin-Cummings Publishing Co., Inc, Redwood City (1986)
Google Scholar
Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., Von Eicken, T.: LogP: towards a realistic model of parallel computation, vol. 28. ACM (1993)
Demmel, J., Hoemmen, M., Mohiyuddin, M., Yelick, K.: Avoiding communication in sparse matrix computations. In: IEEE International Symposium on Parallel and Distributed Processing: IPDPS 2008, pp. 1–12. IEEE (2008)
Desprez, F., Markomanolis, G.S., Quinson, M., Suter, F.: Assessing the performance of MPI applications through time-independent trace replay. In: 2011 40th International Conference on Parallel Processing Workshops, IEEE, pp. 467–476 (2011)
Di Girolamo, S., Jolivet, P., Underwood, K.D., Hoefler, T.: Exploiting offload enabled network interfaces. In: IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), 2015, IEEE, pp. 26–33 (2015)
Drosinos, N., Koziris, N.: Performance comparison of pure MPI vs hybrid MPI-openMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings, IEEE, p. 15 (2004)
Ferreira, K.B., Bridges, P.G., Brightwell, R., Pedretti, K.T.: The impact of system design parameters on application noise sensitivity. Clust. Comput. 16(1), 117–129 (2013)
Article Google Scholar
Filgueira, R., Singh, D.E., Carretero, J., Calderón, A., García, F.: Adaptive-CoMPI: enhancing MPI-based applications’ performance and scalability by using adaptive compression. Int. J. High Perform. Comput. Appl. 25(1), 93–114 (2011)
Article Google Scholar
Frank, M.I., Agarwal, A., Vernon, M.K.: LoPC: modeling contention in parallel algorithms, vol. 32. ACM (1997)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)
MATH Google Scholar
Gahvari, H., Baker, A.H., Schulz, M., Yang, U.M., Jordan, K.E., Gropp, W.: Modeling the performance of an algebraic multigrid cycle on HPC platforms. In: Proceedings of the International Conference on Supercomputing, ACM, pp. 172–181 (2011)
Gahvari, H., Gropp, W., Jordan, K.E., Schulz, M., Yang, U.M.: Algebraic multigrid on a dragonfly network: first experiences on a cray xc30. International Workshop on Performance Modeling. Benchmarking and Simulation of High Performance Computer Systems, pp. 3–23. Springer, New York (2014)
Goumas, G., Sotiropoulos, A., Koziris, N.: Minimizing completion time for loop tiling with computation and communication overlapping. In: Proceedings 15th International Parallel and Distributed Processing Symposium, IEEE, p. 10 (2001)
Goumas, G., Drosinos, N., Koziris, N.: Communication-aware supernode shape. IEEE Trans. Parallel Distrib. Syst. 20(4), 498–511 (2009)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Hockney, R.W.: The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput. 20(3), 389–398 (1994)
Article Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. , IEEE Computer Society (2010)
Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: simulating large-scale applications in the LogGOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ACM, pp 597–604 (2010b)
Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, ACM, p. 6 (2011)
Hunold, S., Carpen-Amarie, A.: On the impact of synchronizing clocks and processes on benchmarking MPI collectives. In: EuroMPI, ACM, pp 8:1–8:10 (2015)
Ibeid, H., Yokota, R., Keyes, D.: A performance model for the communication in fast multipole methods on high-performance computing platforms. Int. J. High Perform. Comput. Appl. p. 1094342016634819 (2016)
Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: a parallel computational model for synchronization analysis. In: ACM SIGPLAN Notices, ACM, vol. 36, pp. 133–142 (2001)
Isaacs, K., Gamblin, T., Bhatele, A., Schulz, M., Hamann, B., Bremer, P.: Ordering traces logically to identify lateness in parallel programs. Technical Report LLNL-TR-656141, Lawrence Livermore National Laboratory (2014)
Jain, N., Bhatele, A., Robson, M.P., Gamblin, T., Kale, L.V.: Predicting application performance using supervised learning on communication features. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 95 (2013)
Jokanovic, A., Sancho, J.C., Rodríguez, G., Lucero, A., Minkenberg, C., Labarta, J.: Quiet neighborhoods: key to protect job performance predictability. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS-2015) (2015)
Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical Report LLNL-TR-608824 (2012)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, ACM, pp. 135–146 (2010)
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)
Article MATH Google Scholar
Moritz, C.A., Frank, M.I.: LoGPC: modeling network contention in message-passing programs. ACM SIGMETRICS Perform. Eval. Rev. 26(1), 254–263 (1998)
Article Google Scholar
Moritz, C.A., Frank, M.I.: LoGPG: modeling network contention in message-passing programs. IEEE Trans. Parallel Distrib. Syst. 12(4), 404–415 (2001)
Article Google Scholar
Mudalige, G.R., Vernon, M.K., Jarvis, S.A.: A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–14, IEEE (2008)
Papadopoulou, N., Goumas, G.I., Koziris, N.: A machine-learning approach for communication prediction of large-scale applications. In: 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015, Chicago, IL, USA, September 8-11, 2015, pp. 120–123 (2015)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, IEEE, pp. 427–436 (2009)
Sancho, J.C., Barker, K.J., Kerbyson, D.J., Davis, K.: Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In: Proceedings of the ACM/IEEE SC 2006 Conference, pp. 17–17, IEEE (2006)
Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Article Google Scholar
Shudler, S., Calotoiu, A., Hoefler, T., Strube, A., Wolf, F.: Exascaling your library: will your implementation meet your expectations? In: Proceedings of the 29th ACM on International Conference on Supercomputing, ACM, pp. 165–175 (2015)
Tallent, N.R., Adhianto, L., Mellor-Crummey, J.M.: Scalable identification of load imbalance in parallel executions using call path profiles. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11, IEEE Computer Society (2010)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Vetter, J., Chambreau, C.: mpiP: Lightweight, scalable MPI profiling (2005)
Yu, L., Li, D., Mittal, S., Vetter, J.S.: Quantitatively modeling application resilience with the data vulnerability factor. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, IEEE, pp. 695–706 (2014)
Zhang, C., Ma, Y.: Ensemble Machine Learning. Springer, New York (2012)
Book MATH Google Scholar
Zhu, J., Lastovetsky, A., Ali, S., Riesen, R., Hasanov, K.: Asymmetric communication models for resource-constrained hierarchical ethernet networks. Concurr. Comput.: Pract. Exp. 27(6), 1575–1590 (2015)
Article Google Scholar

Download references

Acknowledgements

This research was supported in part with computational resources at the Norwegian Institute of Science and Technology (NTNU) provided by NOTUR and in part by a grant of computational resources from the Swiss National Supercomputing Center (CSCS) under project ID g83. Nikela Papadopoulou has received funding from IKY fellowships of excellence for postgraduate studies in Greece - SIEMENS program. The authors would also like to thank Sotirios Apostolakis for his contribution to the early steps of this work.

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Nikela Papadopoulou, Georgios Goumas & Nectarios Koziris

Authors

Nikela Papadopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Goumas
View author publications
You can also search for this author in PubMed Google Scholar
Nectarios Koziris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikela Papadopoulou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadopoulou, N., Goumas, G. & Koziris, N. Predictive communication modeling for HPC applications. Cluster Comput 20, 2725–2747 (2017). https://doi.org/10.1007/s10586-017-0821-8

Download citation

Received: 11 October 2016
Revised: 15 February 2017
Accepted: 14 March 2017
Published: 24 March 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-0821-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive communication modeling for HPC applications

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Can GPU performance increase faster than the code error rate?

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predictive communication modeling for HPC applications

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Can GPU performance increase faster than the code error rate?

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation