Skip to main content
Log in

Predictive communication modeling for HPC applications

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In this paper, we present a methodology for predictive modeling of communication of HPC applications. Communication time depends on a complex set of parameters, relevant to the application, the system architecture, the runtime configuration and runtime conditions. To handle this complexity, we define features that can be extracted from the application, the process mapping and the allocation shape ahead of execution, deploy a single benchmark to sweep over the parameter space and develop predictive models for communication time on two supercomputers, Vilje and Piz Daint, using different subsets of our features, machine-learning methods and training sets. We compare the predictive power of our models on two common communication patterns and one application, for various problem sizes, executions and runtime configurations, ranging from a few dozen to a few thousand cores. Our methodology is successful across all tested communication patterns on both systems and exhibits high prediction accuracy and goodness-of-fit, scoring 23.98% in MMRE, 0.942 in RCC and 61.43% in \(Pred_{0.25}\) on Vilje and 21.31%, 0.940 and 66.57% respectively on Piz Daint, with models that are applicable just-in-time ahead of the execution of an HPC application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.top500.org (June 2016).

  2. http://www.cscs.ch/computers/piz_daint_piz_dora/index.html.

  3. https://www.hpc.ntnu.no/display/hpc/Vilje.

  4. http://hama.apache.org.

  5. http://www.adaptivecomputing.com/products/open-source/torque/.

  6. http://slurm.schedmd.com/.

  7. https://linux.die.net/man/8/ibstat.

  8. http://pubs.cray.com/#/Collaborate/00256453-FA.

  9. https://codesign.llnl.gov/amg2013.php.

  10. https://codesign.llnl.gov/kripke.php.

  11. http://www.exmatex.org/comd.html.

  12. https://mantevo.org/.

  13. https://www.nas.nasa.gov/publications/npb.html.

  14. https://ccse.lbl.gov/ExaCT/index.html.

  15. https://github.com/etmc/tmLQCD.

  16. http://physics.indiana.edu/~sg/milc.html.

  17. http://www.prace-ri.eu/ueabs/#QCD.

  18. https://codesign.llnl.gov/lassen.php.

  19. https://cesar.mcs.anl.gov/content/software/thermal_hydraulics.

  20. http://icl.cs.utk.edu/hpcc/.

References

  1. Alawneh, L., Hamou-Lhadj, A., Hassine, J.: Segmenting large traces of inter-process communication with a focus on high performance computing systems. J. Syst. Softw. 120, 1–16 (2016)

    Article  Google Scholar 

  2. Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, pp. 95–105 (1995)

  3. Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray xc series network. Cray Inc, White Paper WP-Aries01-1112 (2012)

  4. Bauer, G., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2012, IEEE, pp. 652–659 (2012)

  5. Bédaride, P., Degomme, A., Genaud, S., Legrand, A., Markomanolis, G., Quinson, M., Stillwell, M.L., Suter, F., Videau, B., et al.: Toward better simulation of MPI applications on Ethernet/TCP networks. In: PMBS13-4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (2013)

  6. Bekas, C., Curioni, A.: A new energy aware performance metric. Comput. Sci. Res. Dev. 25(3–4), 187–195 (2010)

    Article  Google Scholar 

  7. Bhatelé, A., Kalé, L.V.: Quantifying network contention on large parallel machines. Parallel Process. Lett. 19(04), 553–572 (2009)

    Article  MathSciNet  Google Scholar 

  8. Bhatele, A., Mohror, K., Langer, S.H., Isaacs, K.E.: There goes the neighborhood: performance degradation due to nearby jobs. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 41 (2013)

  9. Bhatele, A., Titus, A.R., Thiagarajan, J.J., Jain, N., Gamblin, T., Bremer, P.T., Schulz, M., Kale, L.V.: Identifying the culprits behind network congestion. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 113–122 (2015)

  10. Böhme, D., Geimer, M., Arnold, L., Voigtlaender, F., Wolf, F.: Identifying the root causes of wait states in large-scale parallel applications. ACM Trans. Parallel Comput. (TOPC) 3(2), 11 (2016)

    Google Scholar 

  11. Casanova, H., Desprez, F., Markomanolis, G.S., Suter, F.: Simulation of MPI applications with time-independent traces. Concurr. Comput.: Pract. Exp. 27(5), 1145–1168 (2015)

    Article  Google Scholar 

  12. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  13. Chen, W., Zhai, J., Zhang, J., Zheng, W.: LogGPO: an accurate communication model for performance prediction of MPI programs. Sci. China Ser. F: Inf. Sci. 52(10), 1785–1791 (2009)

    Article  MATH  Google Scholar 

  14. Conte, S.D., Dunsmore, H.E., Shen, V.Y.: Software Engineering Metrics and Models. Benjamin-Cummings Publishing Co., Inc, Redwood City (1986)

    Google Scholar 

  15. Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., Von Eicken, T.: LogP: towards a realistic model of parallel computation, vol. 28. ACM (1993)

  16. Demmel, J., Hoemmen, M., Mohiyuddin, M., Yelick, K.: Avoiding communication in sparse matrix computations. In: IEEE International Symposium on Parallel and Distributed Processing: IPDPS 2008, pp. 1–12. IEEE (2008)

  17. Desprez, F., Markomanolis, G.S., Quinson, M., Suter, F.: Assessing the performance of MPI applications through time-independent trace replay. In: 2011 40th International Conference on Parallel Processing Workshops, IEEE, pp. 467–476 (2011)

  18. Di Girolamo, S., Jolivet, P., Underwood, K.D., Hoefler, T.: Exploiting offload enabled network interfaces. In: IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), 2015, IEEE, pp. 26–33 (2015)

  19. Drosinos, N., Koziris, N.: Performance comparison of pure MPI vs hybrid MPI-openMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings, IEEE, p. 15 (2004)

  20. Ferreira, K.B., Bridges, P.G., Brightwell, R., Pedretti, K.T.: The impact of system design parameters on application noise sensitivity. Clust. Comput. 16(1), 117–129 (2013)

    Article  Google Scholar 

  21. Filgueira, R., Singh, D.E., Carretero, J., Calderón, A., García, F.: Adaptive-CoMPI: enhancing MPI-based applications’ performance and scalability by using adaptive compression. Int. J. High Perform. Comput. Appl. 25(1), 93–114 (2011)

    Article  Google Scholar 

  22. Frank, M.I., Agarwal, A., Vernon, M.K.: LoPC: modeling contention in parallel algorithms, vol. 32. ACM (1997)

  23. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)

    MATH  Google Scholar 

  24. Gahvari, H., Baker, A.H., Schulz, M., Yang, U.M., Jordan, K.E., Gropp, W.: Modeling the performance of an algebraic multigrid cycle on HPC platforms. In: Proceedings of the International Conference on Supercomputing, ACM, pp. 172–181 (2011)

  25. Gahvari, H., Gropp, W., Jordan, K.E., Schulz, M., Yang, U.M.: Algebraic multigrid on a dragonfly network: first experiences on a cray xc30. International Workshop on Performance Modeling. Benchmarking and Simulation of High Performance Computer Systems, pp. 3–23. Springer, New York (2014)

  26. Goumas, G., Sotiropoulos, A., Koziris, N.: Minimizing completion time for loop tiling with computation and communication overlapping. In: Proceedings 15th International Parallel and Distributed Processing Symposium, IEEE, p. 10 (2001)

  27. Goumas, G., Drosinos, N., Koziris, N.: Communication-aware supernode shape. IEEE Trans. Parallel Distrib. Syst. 20(4), 498–511 (2009)

    Article  Google Scholar 

  28. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  29. Hockney, R.W.: The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput. 20(3), 389–398 (1994)

    Article  Google Scholar 

  30. Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. , IEEE Computer Society (2010)

  31. Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: simulating large-scale applications in the LogGOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ACM, pp 597–604 (2010b)

  32. Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, ACM, p. 6 (2011)

  33. Hunold, S., Carpen-Amarie, A.: On the impact of synchronizing clocks and processes on benchmarking MPI collectives. In: EuroMPI, ACM, pp 8:1–8:10 (2015)

  34. Ibeid, H., Yokota, R., Keyes, D.: A performance model for the communication in fast multipole methods on high-performance computing platforms. Int. J. High Perform. Comput. Appl. p. 1094342016634819 (2016)

  35. Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: a parallel computational model for synchronization analysis. In: ACM SIGPLAN Notices, ACM, vol. 36, pp. 133–142 (2001)

  36. Isaacs, K., Gamblin, T., Bhatele, A., Schulz, M., Hamann, B., Bremer, P.: Ordering traces logically to identify lateness in parallel programs. Technical Report LLNL-TR-656141, Lawrence Livermore National Laboratory (2014)

  37. Jain, N., Bhatele, A., Robson, M.P., Gamblin, T., Kale, L.V.: Predicting application performance using supervised learning on communication features. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 95 (2013)

  38. Jokanovic, A., Sancho, J.C., Rodríguez, G., Lucero, A., Minkenberg, C., Labarta, J.: Quiet neighborhoods: key to protect job performance predictability. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS-2015) (2015)

  39. Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical Report LLNL-TR-608824 (2012)

  40. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, ACM, pp. 135–146 (2010)

  41. Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)

    Article  MATH  Google Scholar 

  42. Moritz, C.A., Frank, M.I.: LoGPC: modeling network contention in message-passing programs. ACM SIGMETRICS Perform. Eval. Rev. 26(1), 254–263 (1998)

    Article  Google Scholar 

  43. Moritz, C.A., Frank, M.I.: LoGPG: modeling network contention in message-passing programs. IEEE Trans. Parallel Distrib. Syst. 12(4), 404–415 (2001)

    Article  Google Scholar 

  44. Mudalige, G.R., Vernon, M.K., Jarvis, S.A.: A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–14, IEEE (2008)

  45. Papadopoulou, N., Goumas, G.I., Koziris, N.: A machine-learning approach for communication prediction of large-scale applications. In: 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015, Chicago, IL, USA, September 8-11, 2015, pp. 120–123 (2015)

  46. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  47. Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, IEEE, pp. 427–436 (2009)

  48. Sancho, J.C., Barker, K.J., Kerbyson, D.J., Davis, K.: Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In: Proceedings of the ACM/IEEE SC 2006 Conference, pp. 17–17, IEEE (2006)

  49. Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    Article  Google Scholar 

  50. Shudler, S., Calotoiu, A., Hoefler, T., Strube, A., Wolf, F.: Exascaling your library: will your implementation meet your expectations? In: Proceedings of the 29th ACM on International Conference on Supercomputing, ACM, pp. 165–175 (2015)

  51. Tallent, N.R., Adhianto, L., Mellor-Crummey, J.M.: Scalable identification of load imbalance in parallel executions using call path profiles. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11, IEEE Computer Society (2010)

  52. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  53. Vetter, J., Chambreau, C.: mpiP: Lightweight, scalable MPI profiling (2005)

  54. Yu, L., Li, D., Mittal, S., Vetter, J.S.: Quantitatively modeling application resilience with the data vulnerability factor. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, IEEE, pp. 695–706 (2014)

  55. Zhang, C., Ma, Y.: Ensemble Machine Learning. Springer, New York (2012)

    Book  MATH  Google Scholar 

  56. Zhu, J., Lastovetsky, A., Ali, S., Riesen, R., Hasanov, K.: Asymmetric communication models for resource-constrained hierarchical ethernet networks. Concurr. Comput.: Pract. Exp. 27(6), 1575–1590 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part with computational resources at the Norwegian Institute of Science and Technology (NTNU) provided by NOTUR and in part by a grant of computational resources from the Swiss National Supercomputing Center (CSCS) under project ID g83. Nikela Papadopoulou has received funding from IKY fellowships of excellence for postgraduate studies in Greece - SIEMENS program. The authors would also like to thank Sotirios Apostolakis for his contribution to the early steps of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikela Papadopoulou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papadopoulou, N., Goumas, G. & Koziris, N. Predictive communication modeling for HPC applications. Cluster Comput 20, 2725–2747 (2017). https://doi.org/10.1007/s10586-017-0821-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0821-8

Keywords

Navigation