Skip to main content
Log in

MPI-Performance-Aware-Reallocation: method to optimize the mapping of processes applied to a cloud infrastructure

  • Published:
Computing Aims and scope Submit manuscript

Abstract

The cloud brings new possibilities to run traditional HPC applications, giving its flexibility and reduced cost. However, running MPI applications in the cloud can reduce appreciably its performance, because the cloud hides its internal network topology information, and existing topology-aware techniques to optimize MPI communications cannot be directly applied to virtualized infrastructures. In this paper it is presented the MPI-Performance-Aware-Reallocation method (MPAR), a general approach to improve MPI communications. This new approach: (i) is not linked to any specific software or hardware infrastructure, (ii) is applicable to cloud, (iii) abstracts the network topology performing experimental tests, and (iv) is able to improve the performance of the MPI users application via the reallocation of the involved MPI processes. The MPAR has been demonstrated for cloud infrastructures, via the implementation of the Latency-Aware-MPI-Cloud-Scheduler (LAMPICS) layer. LAMPICS is able to improve the latency of MPI communications in clouds, without the need of creating ad-hoc MPI implementations or modifying the source code of user’s MPI applications. We have tested LAMPICS with the Sendrecv micro benchmark provided by the Intel MPI Benchmarks, with performance improvements of up to 70%, and with two real-world applications from the Unified European Applications Benchmark Suite, obtaining performance improvements of up to 26.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Al-Tawil K, Moritz CA (2001) Performance modeling and evaluation of MPI. J Parallel Distrib Comput 61(2):202–223. doi:10.1006/jpdc.2000.1677

    Article  MATH  Google Scholar 

  2. Enkovaara J, Rostgaard C, Mortensen JJ, Chen J, Duak M, Ferrighi L, Gavnholt J, Glinsvad C, Haikola V, Hansen HA, Kristoffersen HH, Kuisma M, Larsen AH, Lehtovaara L, Ljungberg M, Lopez-Acevedo O, Moses PG, Ojanen J, Olsen T, Petzold V, Romero NA, Stausholm-Møller J, Strange M, Tritsaris GA, Vanin M, Walter M, Hammer B, Häkkinen H, Madsen GKH, Nieminen RM, Nørskov JK, Puska M, Rantala TT, Schiøtz J, Thygesen KS, Jacobsen KW (2010) Electronic structure calculations with GPAW: a real-space implementation of the projector augmented-wave method. J Phys Condens Matter 22(25):253,202. doi:10.1088/0953-8984/22/25/253202

    Article  Google Scholar 

  3. Gong Y, He B, Zhong J (2015) Network performance aware MPI collective communication operations in the cloud. IEEE Trans Parallel Distrib Syst 26(11):3079–3089. doi:10.1109/TPDS.2013.96

    Article  Google Scholar 

  4. Hurwitz JG, Feng Wc (2005) Analyzing MPI performance over 10-gigabit ethernet. J Parallel Distrib Comput 65(10):1253–1260. doi:10.1016/j.jpdc.2005.04.011

    Article  Google Scholar 

  5. Intel Corporation (2016) Intel MPI Benchmarks, User Guide and Methodology Description. http://software.intel.com/en-us/articles/intel-mpi-benchmarks/. Accessed 17 July 2017

  6. Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2010 IEEE second international conference on cloud computing technology and science, IEEE, pp 159–168. doi:10.1109/CloudCom.2010.69. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5708447

  7. Kandalla K, Subramoni H, Vishnu A, Panda DK (2010) Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: case studies with scatter and gather. In: 2010 IEEE international symposium on parallel and distributed processing, workshops and Ph.d. forum (IPDPSW), IEEE, pp 1–8. doi:10.1109/IPDPSW.2010.5470853. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5470853

  8. Le TT, Rejeb J (2006) A detailed MPI communication model for distributed systems. Future Gener Comput Syst 22(3):269–278. doi:10.1016/j.future.2005.08.005

    Article  Google Scholar 

  9. Liu J, Chandrasekaran B, Wu J, Jiang W, Kini S, Yu W, Buntinas D, Wyckoff P, Panda DK (2003) Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In: Proceedings of the 2003 ACM/IEEE conference on supercomputing—SC ’03, ACM Press, New York, New York, USA, p 58. doi:10.1145/1048935.1050208. http://portal.acm.org/citation.cfm?doid=1048935.1050208

  10. Martinez DR, Cabaleiro JC, Pena TF, Rivera FF, Blanco V (2009) Accurate analytical performance model of communications in MPI applications. In: 2009 IEEE international symposium on parallel and distributed processing, IEEE, pp 1–8. doi:10.1109/IPDPS.2009.5161175. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5161175

  11. Rak M, Turtur M, Villano U (2014) Early Prediction of the Cost of HPC Application Execution in the Cloud. In: 2014 16th international symposium on symbolic and numeric algorithms for scientific computing, IEEE, pp 409–416. doi:10.1109/SYNASC.2014.61. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7034711

  12. Schulz M, Bhatele A, Bremer PT, Gamblin T, Isaacs K, Levine JA, Pascucci V (2012) Creating a tool set for optimizing topology-aware node mappings. In: Brunst H, Müller MS, Nagel WE, Resch MM (eds) Tools for high performance computing 2011, chap. 1, Springer, Berlin, pp 1–12. doi:10.1007/978-3-642-31476-6_1

  13. Skinner D (2005) Performance monitoring of parallel scientific applications. Technical report, Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA . doi:10.2172/881368. http://www.osti.gov/servlets/purl/881368-dOvpFA/

  14. Spiridon VL, Slusanschi EI (2013) N-body simulations with GADGET-2. In: 2013 15th international symposium on symbolic and numeric algorithms for scientific computing, IEEE, pp 526–533. doi:10.1109/SYNASC.2013.75. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6821192

  15. Springel V (2005) The cosmological simulation code GADGET-2. Mon Not R Astron Soc 364(4):1105–1134. doi:10.1111/j.1365-2966.2005.09655.x

    Article  Google Scholar 

  16. Subramoni H, Kandalla K, Vienne J, Sur S, Barth B, Tomko K, Mclay R, Schulz K, Panda D (2011) Design and evaluation of network topology-/speed-aware broadcast algorithms for InfiniBand clusters. In: 2011 IEEE international conference on cluster computing, IEEE, pp 317–325. doi:10.1109/CLUSTER.2011.43. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6061150

  17. Bull M (2013) Unified European Applications Benchmark Suite. Seventh Framework Programme Research Infrastructures. European High Performance Computing (HPC) service PRACE. http://www.prace-ri.eu/ueabs/. Accessed 13 July 2017

  18. Ye K, Jiang X, Ma R, Yan F (2012) VC-migration: live migration of virtual clusters in the cloud. In: 2012 ACM/IEEE 13th international conference on grid computing, IEEE, pp 209–218. doi:10.1109/Grid.2012.27. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6319172

  19. Zhai Y, Liu M, Zhai J, Ma X, Chen W (2011) Cloud versus in-house cluster. In: State of the practice reports on—SC ’11, ACM Press, New York, New York, USA, p 1. doi:10.1145/2063348.2063363. http://dl.acm.org/citation.cfm?doid=2063348.2063363

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Gomez-Folgar.

Additional information

This work has been supported by FEDER funds and by Spanish Government (MCYT) under projects TIN-2013-41129-P, TIN-2016-76373-P and TEC2014-59402-JIN, and by the Spanish Ministry of Education, Culture and Sports under FPU grants FPU12/05190 and FPU12/02916.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gomez-Folgar, F., Indalecio, G., Seoane, N. et al. MPI-Performance-Aware-Reallocation: method to optimize the mapping of processes applied to a cloud infrastructure. Computing 100, 211–226 (2018). https://doi.org/10.1007/s00607-017-0573-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-017-0573-6

Keywords

Mathematics Subject Classification

Navigation