Improving the efficiency of HPC data movement on container-based virtual cluster

Huang, Dan; Lu, Yutong

doi:10.1007/s42514-020-00025-w

Improving the efficiency of HPC data movement on container-based virtual cluster

Regular Paper
Published: 10 March 2020

Volume 2, pages 67–80, (2020)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Dan Huang¹ &
Yutong Lu¹

688 Accesses
2 Citations
Explore all metrics

Abstract

Today, lightweight virtualization technologies have been widely deployed on data centers and HPC clusters to provide highly efficient and elastic resource provisioning. Virtualization has also been extended to the I/O stack in operating system. For example, virtual switch has become the primary provider of I/O services for data movement among various light-weight virtual machines, such as Docker and Kubernetes. However, I/O stack virtualization introduces performance degradation and scalability bottleneck to the data movements of HPC computing framework, such as MPI based collective data movements and bursty asynchronous data movements. In order to study the bottleneck, we quantify and analyze the performance degradation involving with HPC data movements on virtual clusters. Then, we design a set of two-stage methods to proactively adapt the virtual network and data movement procedures. This can enhance the performance of HPC collective data movements by up to 3\(\times \). Meanwhile, a cross-layer middleware is designed to improve the performance and scalability of bursty asynchronous data movements. Our evaluation shows that it can improve the performance of real scientific application by 34.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study

Article 30 November 2020

Distributed Computing Infrastructure Based on Dynamic Container Clusters

Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 16), pp. 265–283 (2016)
Armitage, Grenville: MPLs: the magic behind the myths [multiprotocol label switching]. Commun. Mag. IEEE 38(1), 124–131 (2000)
Article Google Scholar
Burtsev, A., Srinivasan, K., Radhakrishnan, P., Voruganti, K., Goodson, G.R.: Fido: fast inter-virtual-machine communication for enterprise appliances. In: USENIX Annual Technical Conference, San Diego, CA (2009)
Chai, L., Lai, P., Jin, H.-W., Panda, D.K.: Designing an efficient kernel-level and user-level hybrid approach for MPI intra-node communication on multi-core systems. In: Parallel Processing, 2008. ICPP’08. 37th International Conference on, pp. 222–229. IEEE (2008)
den Burger, M., Kielmann, T.: Collective receiver-initiated multicast for grid applications. Parallel Distrib. Syst. IEEE Trans. 22(2), 231–244 (2011)
Article Google Scholar
Docker: https://www.docker.com/ (2019). Accessed 22 Dec 2019
Friedley, A., Bronevetsky, G., Hoefler, T., Lumsdaine, A.: Hybrid MPI: efficient message passing for multi-core systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, p. 18 (2013)
Gong, Y., He, B., Zhong, J.: Network performance aware MPI collective communication operations in the cloud. IEEE Trans. Parallel Distrib. Syst. 26(11), 3079–3089 (2013)
Article Google Scholar
Gong, Y., He, B., Li, D.: Finding constant from change: revisiting network performance aware optimizations on IAAS clouds. In: High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. IEEE, pp. 982–993 (2014)
Graham, R.L., Shipman, G.: MPI support for multi-core architectures: Optimized shared memory collectives. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 130–140 (2008)
Gropp, W.: Mpich2: a new start for MPI implementations. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, p. 7 (2002)
Hanks, S., Meyer, D., Farinacci, D., Traina, P.: Generic routing encapsulation(GRE)RFC 1701 (2000)
Huang, D., Liu, Q., Klasky, S., Wang, J., Choi, J.Y., Logan, J., Podhorszki, N.: Harnessing data movement in virtual clusters for in-situ execution. IEEE Trans. Parallel Distrib. Syst. 30(3), 615–629 (2018)
Article Google Scholar
Hwang, J., Ramakrishnan, K.K., Wood, T.: Netvm: high performance and flexible networking using virtualization on commodity platforms. Netw. Serv. Manag. IEEE Trans. 12(1), 34–47 (2015)
Article Google Scholar
Kamil, S., Shalf, J., Oliker, L., Skinner, D.: Understanding ultra-scale application communication requirements. In: Workload Characterization Symposium, 2005. Proceedings of the IEEE International. IEEE, pp. 178–187 (2005)
Kandalla, K., Subramoni, H., Vishnu, A., Panda, D.K.: Designing topology-aware collective communication algorithms for large scale infiniband clusters: case studies with scatter and gather. In: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. IEEE, pp. 1–8 (2010)
Karonis, N.T., De Supinski, B.R., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International. IEEE, pp. 377–384 (2000)
Kielmann, Thilo, Hofman, Rutger FH, Bal, Henri E, Plaat, Aske, Bhoedjang, Raoul AF: Magpie: MPI’s collective communication operations for clustered wide area systems. ACM Sigplan Notices 34(8), 131–140 (1999)
Article Google Scholar
Koponen, T., Amidon, K., Balland, P., Casado, M., Chanda, A., Fulton, B., Ganichev, I., Gross, J., Gude, N., Ingram, P., et al.: Network virtualization in multi-tenant datacenters. In: USENIX NSDI (2014)
Kubernetes: http://kubernetes.io/ (2019). Accessed 22 Dec 2019
Kwon, Y., Nunley, D., Gardner, J.P.: Magdalena B., Bill, H., Sarah, L.. Scalable clustering algorithm for N-body simulations in a shared-nothing cluster. Technical Report, University of Washington, Seattle, WA (2009)
Lai, P., Sur, S., Panda, D.K.: Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems. Comput. Sci. Res. Dev. 25(1–2), 3–14 (2010)
Article Google Scholar
Li, S., Hoefler, T., Snir, M.: Numa-aware shared-memory collective communication for MPI. In: Proceedings of the 22nd International Aymposium on High-Performance Parallel and Distributed Computing, ACM, pp. 85–96 (2013)
Lin, Z., Ethier, S., Hahm, T.S., Tang, W.M.: Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett. 88(19), 195004 (2002)
Article Google Scholar
Linux Container: https://linuxcontainers.org/ (2018). Accessed 22 Sept 2019
Ma, T., Herault, T., Bosilca, G., Dongarra, J.J.: Process distance-aware adaptive MPI collective communications. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on. IEEE, pp. 196–204 (2011)
Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., Wright, C.: Virtual extensible local area network (vxlan): a framework for overlaying virtualized layer 2 networks over layer 3 networks. Technical Report (2014)
Mamidala, A.R., Kumar, R., De, D., Panda, D.K : MPI collectives on modern multicore clusters: performance optimizations and communication characteristics. In: Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on. IEEE, pp. 130–137 (2008)
Probe’s Marmot and Susitna Clusters: http://nmc-probe.org (2017). Accessed 12 May 2017
Ram, K.K., Cox, A.L., Chadha, M., Rixner, S., Barr, T.W., Smith, R., Rixner, S.: Hyper-switch: a scalable software virtual switching architecture. In: USENIX Annual Technical Conference, pp. 13–24 (2013)
Reussner, R., Sanders, P., Träff, J.L.: Skampi: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002)
Google Scholar
Salmond, G.L., Holmes, C.A., Milburn, G.J.: Dynamics of a strongly driven two-component Bose–Einstein condensate. Phys. Rev. A 65(3), 033623 (2002)
Article Google Scholar
Sergeev, A., Balso, M.D.: Horovod: fast and easy distributed deep learning in tensorflow. arXiv:1802.05799 (2018)
Sistare, S., Vandevaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMP’s. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, ACM, p. 23 (1999)
Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: ACM SIGOPS Operating Systems Review, vol. 41. ACM, pp. 275–287 (2007)
Subramoni, H., Kandalla, K., Vienne, J., Sur, S., Barth, B., Tomko, K., Mclay, R., Schulz, K., Panda, D.K: Design and evaluation of network topology-/speed-aware broadcast algorithms for infiniband clusters. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on. IEEE, pp. 317–325 (2011)
Subramoni, H., Potluri, S., Kandalla, K., Barth, B., Vienne, J., Keasler, J., Tomko, K., Schulz, K., Moody, A., Panda, D.K: Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 70 (2012)
Sundararaj, A., Gupta, A., Dinda, P., et al.: Increasing application performance in virtual environments through run-time inference and adaptation. In: High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium on. IEEE (2005), pp. 47–58 (2005)
Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 257–267 (2003)
Trahay, F., Denis, A., Aumage, O., Namyst, R.: Improving reactivity and communication overlap in MPI using a generic i/o manager. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, pp. 170–177 (2007)
Vazhkudai, S.S., de Supinski, B.R., Bland, Arthur S., Geist, A., Sexton, J., Kahle, J., Zimmer, C.J., Atchley, S., Oral, S., Maxwell, D.E. et al.: The design, deployment, and evaluation of the coral pre-exascale systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, p. 52 (2018)
Wang, B., Ethier, S., Tang, W., Williams, T., Ibrahim, K.Z., Madduri, K., Williams, S., Oliker, L.: Kinetic turbulence simulations at extreme scale on leadership-class systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, p. 82 (2013)
Xavier, M.G., Neves, M.V., Rossi, F.D, Ferreto, T.C, Lange, T., De Rose, C.A.F: Performance evaluation of container-based virtualization for high performance computing environments. In: Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on. IEEE, pp. 233–240 (2013)

Download references

Acknowledgements

This work is supported by National Key R&D Program of China under Grant No.2018YFB0204303, NSFC U1811461, Guangdong Natural Science Foundation 2018B030312002 and the Major Program of Guangdong Basic and Applied Research 2019B030302002.

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Dan Huang & Yutong Lu

Authors

Dan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yutong Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yutong Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, D., Lu, Y. Improving the efficiency of HPC data movement on container-based virtual cluster. CCF Trans. HPC 2, 67–80 (2020). https://doi.org/10.1007/s42514-020-00025-w

Download citation

Received: 01 November 2019
Accepted: 22 February 2020
Published: 10 March 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s42514-020-00025-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the efficiency of HPC data movement on container-based virtual cluster

Abstract

Access this article

Similar content being viewed by others

Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study

Distributed Computing Infrastructure Based on Dynamic Container Clusters

Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the efficiency of HPC data movement on container-based virtual cluster

Abstract

Access this article

Similar content being viewed by others

Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study

Distributed Computing Infrastructure Based on Dynamic Container Clusters

Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation