Abstract
Traditional high performance computing (HPC) systems provide a standard preset environment to support scientific computation. However, HPC development needs to provide support for more and more diverse applications, such as artificial intelligence and big data. The standard preset environment can no longer meet these diverse requirements. If users still run these emerging applications on HPC systems, they need to manually maintain the specific dependencies (libraries, environment variables, and so on) of their applications. This increases the development and deployment burden for users. Moreover, the multi-user mode brings about privacy problems among users. Containers like Docker and Singularity can encapsulate the job’s execution environment, but in a highly customized HPC system, cross-environment application deployment of Docker and Singularity is limited. The introduction of container images also imposes a maintenance burden on system administrators. Facing the above-mentioned problems, in this paper we propose a self-deployed execution environment (SDEE) for HPC. SDEE combines the advantages of traditional virtualization and modern containers. SDEE provides an isolated and customizable environment (similar to a virtual machine) to the user. The user is the root user in this environment. The user develops and debugs the application and deploys its special dependencies in this environment. Then the user can load the job to compute nodes directly through the traditional HPC job management system. The job and its dependencies are analyzed, packaged, deployed, and executed automatically. This process enables transparent and rapid job deployment, which not only reduces the burden on users, but also protects user privacy. Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.
摘要
传统高性能计算系统提供了标准的预置环境来支持科学计算。然而, 高性能计算的发展需要为越来越多应用提供支持, 如人工智能、大数据等。标准的预设环境已无法满足这些多样化的要求。如果用户仍然在高性能计算系统上运行这些新兴应用程序, 他们需要手动维护应用程序的特定依赖项(库、环境变量等), 这增加了用户的开发和部署负担。此外, 多用户模式也带来用户之间的隐私问题。像Docker和Singularity这样的容器可以封装作业的执行环境, 但是在高度定制的高性能计算系统中, Docker和Singularity的跨环境应用部署是受限的。容器镜像的引入也给系统管理员增加了维护负担。针对上述问题, 本文提出一种适用于高性能计算的自部署执行环境(SDEE)。SDEE结合了传统虚拟化和现代容器的优点, 为用户提供了一个独立、可定制的环境(类似于虚拟机)。该用户是此环境中的根用户。用户开发和调试应用程序, 并在此环境中部署其特殊的依赖项, 然后可以通过传统高性能计算系统的作业管理系统, 直接将作业加载到计算节点。作业及其依赖项将被自动分析、打包、部署和执行。该过程实现了透明、快速的作业部署, 不仅减轻了用户负担, 而且保护了用户隐私。实验表明, SDEE引入的开销可忽略不计, 比Docker和Singularity都要低。
Similar content being viewed by others
References
Azginoglu N, Atasever MU, Aydin Z, et al., 2017. Open source slurm computer cluster system design and a sample application. Proc Int Conf on Computer Science and Engineering, p.403–406. https://doi.org/10.1109/UBMK.2017.8093424
Bailey DH, Harris T, Saphir W, et al., 1995. The NAS Parallel Benchmarks 2.0. Technical Report. Belkin M, Haas R, Arnold GW, et al., 2018. Container solutions for HPC systems: a case study of using shifter on blue waters. Proc Practice and Experience on Advanced Research Computing, p.1–8. https://doi.org/10.1145/3219104.3219145
Bernstein D, 2014. Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Comput, 1(3):81–84. https://doi.org/10.1109/MCC.2014.51
Beserra D, Moreno ED, Endo PT, et al., 2015. Performance analysis of LXC for HPC environments. Proc 9th Int Conf on Complex, Intelligent, and Software Intensive Systems, p.358–363. https://doi.org/10.1109/CISIS.2015.53
Biederman EW, Networx L, 2006. Multiple instances of the global linux namespaces. Proc Linux Symp, p.101–112.
Boettiger C, 2015. An introduction to Docker for reproducible research. ACM SIGOPS Oper Syst Rev, 49(1):71–79. https://doi.org/10.1145/2723872.2723882
Casalicchio E, Perciballi V, 2017. Measuring Docker performance: what a mess!!! Proc 8th ACM/SPEC Int Conf on Performance Engineering Companion, p.11–16. https://doi.org/10.1145/3053600.3053605
Che JH, Shi CC, Yu Y, et al., 2010. A synthetical performance evaluation of OpenVZ, Xen and KVM. Proc IEEE Asia-Pacific Services Computing Conf, p.587–594.
Christer E, 2012. Simple Linux Utility for Resource Management. Platform LSF. Technical Report.
Feng HH, Misra V, Rubenstein D, 2007. PBS: a unified priority-based scheduler. Proc ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems, p.203–214. https://doi.org/10.1145/1254882.1254906
Gantikow H, Klingberg S, Reich C, 2015. Container-based virtualization for HPC. Int Conf on Cloud Computing and Services Science, p.543–550.
Georgiou Y, Hautreux M, 2013. Evaluating scalability and efficiency of the resource and job management system on large HPC clusters. Workshop on Job Scheduling Strategies for Parallel Processing, p.134–156.
Gerhardt L, Bhimji W, Canon S, et al., 2017. Shifter: containers for HPC. J Phys Conf Ser, 898:082021.
Godlove D, 2019. Singularity: simple, secure containers for compute-driven workloads. Proc Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), p.1–4. https://doi.org/10.1145/3332186.3332192
Hale JS, Li LZ, Richardson CN, et al., 2017. Containers for portable, productive, and performant scientific computing. Comput Sci Eng, 19(6):40–50. https://doi.org/10.1109/MCSE.2017.2421459
Herbein S, Dusia A, Landwehr A, et al., 2016. Resource management for running HPC applications in container clouds. Int Conf on High Performance Computing, p.261–278. https://doi.org/10.1007/978-3-319-41321-1_14
Huang Z, Wu S, Jiang S, et al., 2019. FastBuild: Accelerating Docker image building for efficient development and deployment of container. Proc 35th Symp on Mass Storage Systems and Technologies, p.28–37. https://doi.org/10.1109/MSST.2019.00-18
Kopytov A, 2012. SysBench Manual. MySQL AB. Kovari A, Dukan P, 2012. KVM & OpenVZ virtualization based IaaS open source cloud virtualization platforms: OpenNode, Proxmox VE. Proc IEEE 10th Jubilee Int Symp on Intelligent Systems and Informatics, p.335–339. https://doi.org/10.1109/SISY.2012.6339540
Kurtzer GM, Sochat V, Bauer MW, 2017. Singularity: scientific containers for mobility of compute. PLOS ONE, 12(5):e0177459. https://doi.org/10.1371/journal.pone.0177459
Kwon S, Lee JH, 2020. DIVDS: Docker image vulnerability diagnostic system. IEEE Access, 8:42666–42673. https://doi.org/10.1109/ACCESS.2020.2976874
Lingayat A, Badre RR, Kumar Gupta A, 2018. Performance evaluation for deploying Docker containers on baremetal and virtual machine. Proc 3rd Int Conf on Communication and Electronics Systems, p.1019–1023. https://doi.org/10.1109/CESYS.2018.8723998
Manco F, Lupu C, Schmidt F, et al., 2017. My VM is lighter (and safer) than your container. Proc 26th Symp on Operating Systems Principles, p.218–233. https://doi.org/10.1145/3132747.3132763
Merkel D, 2014. Docker: lightweight linux containers for consistent development and deployment. Linux J, 2014(239):2.
Mizusawa N, Nakazima K, Yamaguchi S, 2017. Performance evaluation of file operations on OverlayFS. Proc 5th Int Symp on Computing and Networking, p.597–599. https://doi.org/10.1109/CANDAR.2017.62
Rosen R, 2013. Resource Management: Linux Kernel Namespaces and Cgroups. Huafix. Technical Report.
Saha P, Beltre A, Uminski P, et al., 2018. Evaluation of Docker containers for scientific workloads in the cloud. Proc Practice and Experience on Advanced Research Computing, p.1–8. https://doi.org/10.1145/3219104.3229280
Wang B, Chen ZG, Xiao N, 2020. A survey of system scheduling for HPC and big data. Proc 4th Int Conf on High Performance Compilation, Computing and Communications, p.178–183. https://doi.org/10.1145/3407947.3407977
Wang K, Zhou XB, Chen H, et al., 2014. Next generation job management systems for extreme-scale ensemble computing. Proc 23rd Int Symp on High-Performance Parallel and Distributed Computing, p.111–114. https://doi.org/10.1145/2600212.2600703
Wright CP, Dave J, Gupta P, et al., 2006. Versatility and Unix semantics in namespace unification. ACM Trans Stor, 2(1):74–105. https://doi.org/10.1145/1138041.1138045
Xavier MG, Neves MV, Rossi FD, et al., 2013. Performance evaluation of container-based virtualization for high performance computing environments. Proc 21st Euromicro Int Conf on Parallel, Distributed, and Network-Based Processing, p.233–240. https://doi.org/10.1109/PDP.2013.41
Acknowledgements
The authors wish to thank Yiqing DAI, Kun ZHANG, and Hao HAN for their help in system debugging. We would also like to thank Zhenwei WU and Yushuqing ZHANG for improving the paper.
Author information
Authors and Affiliations
Contributions
Mingtian SHAO designed the research. Mingtian SHAO, Kai LU, and Wenzhe ZHANG implemented the system. Mingtian SHAO drafted the paper. Kai LU and Wenzhe ZHANG helped organize the paper. Mingtian SHAO revised and finalized the paper.
Corresponding author
Additional information
Compliance with ethics guidelines
Mingtian SHAO, Kai LU, and Wenzhe ZHANG declare that they have no conflict of interest.
Project supported by the Tianhe Supercomputer Project (No. 2018YFB0204301), the National Natural Science Foundation of China (No. 61902405), the PDL Research Fund (No. 6142110190404), and the National High-Level Personnel for Defense Technology Program (No. 2017-JCJQ-ZQ-013)
Rights and permissions
About this article
Cite this article
Shao, M., Lu, K. & Zhang, W. Self-deployed execution environment for high performance computing. Front Inform Technol Electron Eng 23, 845–857 (2022). https://doi.org/10.1631/FITEE.2100016
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2100016