Skip to main content
Log in

Improving the efficiency of HPC data movement on container-based virtual cluster

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Today, lightweight virtualization technologies have been widely deployed on data centers and HPC clusters to provide highly efficient and elastic resource provisioning. Virtualization has also been extended to the I/O stack in operating system. For example, virtual switch has become the primary provider of I/O services for data movement among various light-weight virtual machines, such as Docker and Kubernetes. However, I/O stack virtualization introduces performance degradation and scalability bottleneck to the data movements of HPC computing framework, such as MPI based collective data movements and bursty asynchronous data movements. In order to study the bottleneck, we quantify and analyze the performance degradation involving with HPC data movements on virtual clusters. Then, we design a set of two-stage methods to proactively adapt the virtual network and data movement procedures. This can enhance the performance of HPC collective data movements by up to 3\(\times \). Meanwhile, a cross-layer middleware is designed to improve the performance and scalability of bursty asynchronous data movements. Our evaluation shows that it can improve the performance of real scientific application by 34.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 16), pp. 265–283 (2016)

  • Armitage, Grenville: MPLs: the magic behind the myths [multiprotocol label switching]. Commun. Mag. IEEE 38(1), 124–131 (2000)

    Article  Google Scholar 

  • Burtsev, A., Srinivasan, K., Radhakrishnan, P., Voruganti, K., Goodson, G.R.: Fido: fast inter-virtual-machine communication for enterprise appliances. In: USENIX Annual Technical Conference, San Diego, CA (2009)

  • Chai, L., Lai, P., Jin, H.-W., Panda, D.K.: Designing an efficient kernel-level and user-level hybrid approach for MPI intra-node communication on multi-core systems. In: Parallel Processing, 2008. ICPP’08. 37th International Conference on, pp. 222–229. IEEE (2008)

  • den Burger, M., Kielmann, T.: Collective receiver-initiated multicast for grid applications. Parallel Distrib. Syst. IEEE Trans. 22(2), 231–244 (2011)

    Article  Google Scholar 

  • Docker: https://www.docker.com/ (2019). Accessed 22 Dec 2019

  • Friedley, A., Bronevetsky, G., Hoefler, T., Lumsdaine, A.: Hybrid MPI: efficient message passing for multi-core systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, p. 18 (2013)

  • Gong, Y., He, B., Zhong, J.: Network performance aware MPI collective communication operations in the cloud. IEEE Trans. Parallel Distrib. Syst. 26(11), 3079–3089 (2013)

    Article  Google Scholar 

  • Gong, Y., He, B., Li, D.: Finding constant from change: revisiting network performance aware optimizations on IAAS clouds. In: High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. IEEE, pp. 982–993 (2014)

  • Graham, R.L., Shipman, G.: MPI support for multi-core architectures: Optimized shared memory collectives. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 130–140 (2008)

  • Gropp, W.: Mpich2: a new start for MPI implementations. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, p. 7 (2002)

  • Hanks, S., Meyer, D., Farinacci, D., Traina, P.: Generic routing encapsulation(GRE)RFC 1701 (2000)

  • Huang, D., Liu, Q., Klasky, S., Wang, J., Choi, J.Y., Logan, J., Podhorszki, N.: Harnessing data movement in virtual clusters for in-situ execution. IEEE Trans. Parallel Distrib. Syst. 30(3), 615–629 (2018)

    Article  Google Scholar 

  • Hwang, J., Ramakrishnan, K.K., Wood, T.: Netvm: high performance and flexible networking using virtualization on commodity platforms. Netw. Serv. Manag. IEEE Trans. 12(1), 34–47 (2015)

    Article  Google Scholar 

  • Kamil, S., Shalf, J., Oliker, L., Skinner, D.: Understanding ultra-scale application communication requirements. In: Workload Characterization Symposium, 2005. Proceedings of the IEEE International. IEEE, pp. 178–187 (2005)

  • Kandalla, K., Subramoni, H., Vishnu, A., Panda, D.K.: Designing topology-aware collective communication algorithms for large scale infiniband clusters: case studies with scatter and gather. In: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. IEEE, pp. 1–8 (2010)

  • Karonis, N.T., De Supinski, B.R., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International. IEEE, pp. 377–384 (2000)

  • Kielmann, Thilo, Hofman, Rutger FH, Bal, Henri E, Plaat, Aske, Bhoedjang, Raoul AF: Magpie: MPI’s collective communication operations for clustered wide area systems. ACM Sigplan Notices 34(8), 131–140 (1999)

    Article  Google Scholar 

  • Koponen, T., Amidon, K., Balland, P., Casado, M., Chanda, A., Fulton, B., Ganichev, I., Gross, J., Gude, N., Ingram, P., et al.: Network virtualization in multi-tenant datacenters. In: USENIX NSDI (2014)

  • Kubernetes: http://kubernetes.io/ (2019). Accessed 22 Dec 2019

  • Kwon, Y., Nunley, D., Gardner, J.P.: Magdalena B., Bill, H., Sarah, L.. Scalable clustering algorithm for N-body simulations in a shared-nothing cluster. Technical Report, University of Washington, Seattle, WA (2009)

  • Lai, P., Sur, S., Panda, D.K.: Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems. Comput. Sci. Res. Dev. 25(1–2), 3–14 (2010)

    Article  Google Scholar 

  • Li, S., Hoefler, T., Snir, M.: Numa-aware shared-memory collective communication for MPI. In: Proceedings of the 22nd International Aymposium on High-Performance Parallel and Distributed Computing, ACM, pp. 85–96 (2013)

  • Lin, Z., Ethier, S., Hahm, T.S., Tang, W.M.: Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett. 88(19), 195004 (2002)

    Article  Google Scholar 

  • Linux Container: https://linuxcontainers.org/ (2018). Accessed 22 Sept 2019

  • Ma, T., Herault, T., Bosilca, G., Dongarra, J.J.: Process distance-aware adaptive MPI collective communications. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on. IEEE, pp. 196–204 (2011)

  • Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., Wright, C.: Virtual extensible local area network (vxlan): a framework for overlaying virtualized layer 2 networks over layer 3 networks. Technical Report (2014)

  • Mamidala, A.R., Kumar, R., De, D., Panda, D.K : MPI collectives on modern multicore clusters: performance optimizations and communication characteristics. In: Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on. IEEE, pp. 130–137 (2008)

  • Probe’s Marmot and Susitna Clusters: http://nmc-probe.org (2017). Accessed 12 May 2017

  • Ram, K.K., Cox, A.L., Chadha, M., Rixner, S., Barr, T.W., Smith, R., Rixner, S.: Hyper-switch: a scalable software virtual switching architecture. In: USENIX Annual Technical Conference, pp. 13–24 (2013)

  • Reussner, R., Sanders, P., Träff, J.L.: Skampi: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002)

    Google Scholar 

  • Salmond, G.L., Holmes, C.A., Milburn, G.J.: Dynamics of a strongly driven two-component Bose–Einstein condensate. Phys. Rev. A 65(3), 033623 (2002)

    Article  Google Scholar 

  • Sergeev, A., Balso, M.D.: Horovod: fast and easy distributed deep learning in tensorflow. arXiv:1802.05799 (2018)

  • Sistare, S., Vandevaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMP’s. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, ACM, p. 23 (1999)

  • Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: ACM SIGOPS Operating Systems Review, vol. 41. ACM, pp. 275–287 (2007)

  • Subramoni, H., Kandalla, K., Vienne, J., Sur, S., Barth, B., Tomko, K., Mclay, R., Schulz, K., Panda, D.K: Design and evaluation of network topology-/speed-aware broadcast algorithms for infiniband clusters. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on. IEEE, pp. 317–325 (2011)

  • Subramoni, H., Potluri, S., Kandalla, K., Barth, B., Vienne, J., Keasler, J., Tomko, K., Schulz, K., Moody, A., Panda, D.K: Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 70 (2012)

  • Sundararaj, A., Gupta, A., Dinda, P., et al.: Increasing application performance in virtual environments through run-time inference and adaptation. In: High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium on. IEEE (2005), pp. 47–58 (2005)

  • Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 257–267 (2003)

  • Trahay, F., Denis, A., Aumage, O., Namyst, R.: Improving reactivity and communication overlap in MPI using a generic i/o manager. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 170–177 (2007)

  • Vazhkudai, S.S., de Supinski, B.R., Bland, Arthur S., Geist, A., Sexton, J., Kahle, J., Zimmer, C.J., Atchley, S., Oral, S., Maxwell, D.E. et al.: The design, deployment, and evaluation of the coral pre-exascale systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, p. 52 (2018)

  • Wang, B., Ethier, S., Tang, W., Williams, T., Ibrahim, K.Z., Madduri, K., Williams, S., Oliker, L.: Kinetic turbulence simulations at extreme scale on leadership-class systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, p. 82 (2013)

  • Xavier, M.G., Neves, M.V., Rossi, F.D, Ferreto, T.C, Lange, T., De Rose, C.A.F: Performance evaluation of container-based virtualization for high performance computing environments. In: Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on. IEEE, pp. 233–240 (2013)

Download references

Acknowledgements

This work is supported by National Key R&D Program of China under Grant No.2018YFB0204303, NSFC U1811461, Guangdong Natural Science Foundation 2018B030312002 and the Major Program of Guangdong Basic and Applied Research 2019B030302002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yutong Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, D., Lu, Y. Improving the efficiency of HPC data movement on container-based virtual cluster. CCF Trans. HPC 2, 67–80 (2020). https://doi.org/10.1007/s42514-020-00025-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00025-w

Keywords

Navigation