Self-deployed execution environment for high performance computing

Shao, Mingtian; Lu, Kai; Zhang, Wenzhe

doi:10.1631/FITEE.2100016

Self-deployed execution environment for high performance computing

面向高性能计算的自部署运行环境

Research Articles
Published: 04 March 2022

Volume 23, pages 845–857, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

117 Accesses
4 Citations
Explore all metrics

Abstract

Traditional high performance computing (HPC) systems provide a standard preset environment to support scientific computation. However, HPC development needs to provide support for more and more diverse applications, such as artificial intelligence and big data. The standard preset environment can no longer meet these diverse requirements. If users still run these emerging applications on HPC systems, they need to manually maintain the specific dependencies (libraries, environment variables, and so on) of their applications. This increases the development and deployment burden for users. Moreover, the multi-user mode brings about privacy problems among users. Containers like Docker and Singularity can encapsulate the job’s execution environment, but in a highly customized HPC system, cross-environment application deployment of Docker and Singularity is limited. The introduction of container images also imposes a maintenance burden on system administrators. Facing the above-mentioned problems, in this paper we propose a self-deployed execution environment (SDEE) for HPC. SDEE combines the advantages of traditional virtualization and modern containers. SDEE provides an isolated and customizable environment (similar to a virtual machine) to the user. The user is the root user in this environment. The user develops and debugs the application and deploys its special dependencies in this environment. Then the user can load the job to compute nodes directly through the traditional HPC job management system. The job and its dependencies are analyzed, packaged, deployed, and executed automatically. This process enables transparent and rapid job deployment, which not only reduces the burden on users, but also protects user privacy. Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.

摘要

传统高性能计算系统提供了标准的预置环境来支持科学计算。然而, 高性能计算的发展需要为越来越多应用提供支持, 如人工智能、大数据等。标准的预设环境已无法满足这些多样化的要求。如果用户仍然在高性能计算系统上运行这些新兴应用程序, 他们需要手动维护应用程序的特定依赖项(库、环境变量等), 这增加了用户的开发和部署负担。此外, 多用户模式也带来用户之间的隐私问题。像Docker和Singularity这样的容器可以封装作业的执行环境, 但是在高度定制的高性能计算系统中, Docker和Singularity的跨环境应用部署是受限的。容器镜像的引入也给系统管理员增加了维护负担。针对上述问题, 本文提出一种适用于高性能计算的自部署执行环境(SDEE)。SDEE结合了传统虚拟化和现代容器的优点, 为用户提供了一个独立、可定制的环境(类似于虚拟机)。该用户是此环境中的根用户。用户开发和调试应用程序, 并在此环境中部署其特殊的依赖项, 然后可以通过传统高性能计算系统的作业管理系统, 直接将作业加载到计算节点。作业及其依赖项将被自动分析、打包、部署和执行。该过程实现了透明、快速的作业部署, 不仅减轻了用户负担, 而且保护了用户隐私。实验表明, SDEE引入的开销可忽略不计, 比Docker和Singularity都要低。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sarus: Highly Scalable Docker Containers for HPC Systems

Enabling EASEY Deployment of Containerized Applications for Future HPC Systems

Running Kubernetes Workloads on HPC

References

Azginoglu N, Atasever MU, Aydin Z, et al., 2017. Open source slurm computer cluster system design and a sample application. Proc Int Conf on Computer Science and Engineering, p.403–406. https://doi.org/10.1109/UBMK.2017.8093424
Bailey DH, Harris T, Saphir W, et al., 1995. The NAS Parallel Benchmarks 2.0. Technical Report. Belkin M, Haas R, Arnold GW, et al., 2018. Container solutions for HPC systems: a case study of using shifter on blue waters. Proc Practice and Experience on Advanced Research Computing, p.1–8. https://doi.org/10.1145/3219104.3219145
Bernstein D, 2014. Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Comput, 1(3):81–84. https://doi.org/10.1109/MCC.2014.51
Article Google Scholar
Beserra D, Moreno ED, Endo PT, et al., 2015. Performance analysis of LXC for HPC environments. Proc 9^th Int Conf on Complex, Intelligent, and Software Intensive Systems, p.358–363. https://doi.org/10.1109/CISIS.2015.53
Biederman EW, Networx L, 2006. Multiple instances of the global linux namespaces. Proc Linux Symp, p.101–112.
Boettiger C, 2015. An introduction to Docker for reproducible research. ACM SIGOPS Oper Syst Rev, 49(1):71–79. https://doi.org/10.1145/2723872.2723882
Article Google Scholar
Casalicchio E, Perciballi V, 2017. Measuring Docker performance: what a mess!!! Proc 8^th ACM/SPEC Int Conf on Performance Engineering Companion, p.11–16. https://doi.org/10.1145/3053600.3053605
Che JH, Shi CC, Yu Y, et al., 2010. A synthetical performance evaluation of OpenVZ, Xen and KVM. Proc IEEE Asia-Pacific Services Computing Conf, p.587–594.
Christer E, 2012. Simple Linux Utility for Resource Management. Platform LSF. Technical Report.
Feng HH, Misra V, Rubenstein D, 2007. PBS: a unified priority-based scheduler. Proc ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems, p.203–214. https://doi.org/10.1145/1254882.1254906
Gantikow H, Klingberg S, Reich C, 2015. Container-based virtualization for HPC. Int Conf on Cloud Computing and Services Science, p.543–550.
Georgiou Y, Hautreux M, 2013. Evaluating scalability and efficiency of the resource and job management system on large HPC clusters. Workshop on Job Scheduling Strategies for Parallel Processing, p.134–156.
Gerhardt L, Bhimji W, Canon S, et al., 2017. Shifter: containers for HPC. J Phys Conf Ser, 898:082021.
Article Google Scholar
Godlove D, 2019. Singularity: simple, secure containers for compute-driven workloads. Proc Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), p.1–4. https://doi.org/10.1145/3332186.3332192
Hale JS, Li LZ, Richardson CN, et al., 2017. Containers for portable, productive, and performant scientific computing. Comput Sci Eng, 19(6):40–50. https://doi.org/10.1109/MCSE.2017.2421459
Article Google Scholar
Herbein S, Dusia A, Landwehr A, et al., 2016. Resource management for running HPC applications in container clouds. Int Conf on High Performance Computing, p.261–278. https://doi.org/10.1007/978-3-319-41321-1_14
Huang Z, Wu S, Jiang S, et al., 2019. FastBuild: Accelerating Docker image building for efficient development and deployment of container. Proc 35^th Symp on Mass Storage Systems and Technologies, p.28–37. https://doi.org/10.1109/MSST.2019.00-18
Kopytov A, 2012. SysBench Manual. MySQL AB. Kovari A, Dukan P, 2012. KVM & OpenVZ virtualization based IaaS open source cloud virtualization platforms: OpenNode, Proxmox VE. Proc IEEE 10^th Jubilee Int Symp on Intelligent Systems and Informatics, p.335–339. https://doi.org/10.1109/SISY.2012.6339540
Kurtzer GM, Sochat V, Bauer MW, 2017. Singularity: scientific containers for mobility of compute. PLOS ONE, 12(5):e0177459. https://doi.org/10.1371/journal.pone.0177459
Article Google Scholar
Kwon S, Lee JH, 2020. DIVDS: Docker image vulnerability diagnostic system. IEEE Access, 8:42666–42673. https://doi.org/10.1109/ACCESS.2020.2976874
Article Google Scholar
Lingayat A, Badre RR, Kumar Gupta A, 2018. Performance evaluation for deploying Docker containers on baremetal and virtual machine. Proc 3^rd Int Conf on Communication and Electronics Systems, p.1019–1023. https://doi.org/10.1109/CESYS.2018.8723998
Manco F, Lupu C, Schmidt F, et al., 2017. My VM is lighter (and safer) than your container. Proc 26^th Symp on Operating Systems Principles, p.218–233. https://doi.org/10.1145/3132747.3132763
Merkel D, 2014. Docker: lightweight linux containers for consistent development and deployment. Linux J, 2014(239):2.
Google Scholar
Mizusawa N, Nakazima K, Yamaguchi S, 2017. Performance evaluation of file operations on OverlayFS. Proc 5^th Int Symp on Computing and Networking, p.597–599. https://doi.org/10.1109/CANDAR.2017.62
Rosen R, 2013. Resource Management: Linux Kernel Namespaces and Cgroups. Huafix. Technical Report.
Saha P, Beltre A, Uminski P, et al., 2018. Evaluation of Docker containers for scientific workloads in the cloud. Proc Practice and Experience on Advanced Research Computing, p.1–8. https://doi.org/10.1145/3219104.3229280
Wang B, Chen ZG, Xiao N, 2020. A survey of system scheduling for HPC and big data. Proc 4^th Int Conf on High Performance Compilation, Computing and Communications, p.178–183. https://doi.org/10.1145/3407947.3407977
Wang K, Zhou XB, Chen H, et al., 2014. Next generation job management systems for extreme-scale ensemble computing. Proc 23^rd Int Symp on High-Performance Parallel and Distributed Computing, p.111–114. https://doi.org/10.1145/2600212.2600703
Wright CP, Dave J, Gupta P, et al., 2006. Versatility and Unix semantics in namespace unification. ACM Trans Stor, 2(1):74–105. https://doi.org/10.1145/1138041.1138045
Article Google Scholar
Xavier MG, Neves MV, Rossi FD, et al., 2013. Performance evaluation of container-based virtualization for high performance computing environments. Proc 21^st Euromicro Int Conf on Parallel, Distributed, and Network-Based Processing, p.233–240. https://doi.org/10.1109/PDP.2013.41

Download references

Acknowledgements

The authors wish to thank Yiqing DAI, Kun ZHANG, and Hao HAN for their help in system debugging. We would also like to thank Zhenwei WU and Yushuqing ZHANG for improving the paper.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, China
Mingtian Shao (邵明天), Kai Lu (卢凯) & Wenzhe Zhang (张文喆)

Authors

Mingtian Shao (邵明天)
View author publications
You can also search for this author inPubMed Google Scholar
Kai Lu (卢凯)
View author publications
You can also search for this author inPubMed Google Scholar
Wenzhe Zhang (张文喆)
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Mingtian SHAO designed the research. Mingtian SHAO, Kai LU, and Wenzhe ZHANG implemented the system. Mingtian SHAO drafted the paper. Kai LU and Wenzhe ZHANG helped organize the paper. Mingtian SHAO revised and finalized the paper.

Corresponding author

Correspondence to Kai Lu (卢凯).

Additional information

Compliance with ethics guidelines

Mingtian SHAO, Kai LU, and Wenzhe ZHANG declare that they have no conflict of interest.

Project supported by the Tianhe Supercomputer Project (No. 2018YFB0204301), the National Natural Science Foundation of China (No. 61902405), the PDL Research Fund (No. 6142110190404), and the National High-Level Personnel for Defense Technology Program (No. 2017-JCJQ-ZQ-013)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, M., Lu, K. & Zhang, W. Self-deployed execution environment for high performance computing. Front Inform Technol Electron Eng 23, 845–857 (2022). https://doi.org/10.1631/FITEE.2100016

Download citation

Received: 09 January 2021
Accepted: 14 February 2021
Published: 04 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1631/FITEE.2100016

Key words

CLC number

TP315

关键词

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-deployed execution environment for high performance computing

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sarus: Highly Scalable Docker Containers for HPC Systems

Enabling EASEY Deployment of Containerized Applications for Future HPC Systems

Running Kubernetes Workloads on HPC

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

关键词

Subscribe and save

Buy Now