Abstract
With the emergence of more and more “AI + Field + HPC” applications, it is urgent to solve the problem of scheduling and management of High-Performance Computing (HPC) resources, as well as the fast and efficient “cloud service” of HPC applications. This engineering problem is particularly critical because it affects the progress of scientific research, the development period of the research platform, and the learning cost of scientists. To solve the problem, a set of reusable life cycle processes for HPC resources are designed. Based on the life cycle, we propose an open service interface based on HPC, which reduces the startup time under multiple refreshes and abnormal retries by using the mode of contention lock. The active interruption of users is a typical scenario in the startup phase. Furthermore, a read-write strategy with an overlay based on Singularity is implemented to save storage space and improve running speed. In order to evaluate the serviceability and performance of the proposed interface, we deploy the service on the Venus platform and make a startup comparison experiment. In addition, the reduction of storage for 100 users is also tested. The experimental results show that under the HPC environment with SLURM, the proposed open-service interface can effectively shorten 46% startup time of applications and services and reduce 25% storage at least for each user of the Venus platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. JPDC 72(12), 1565–1575 (2012)
Qiu, M., Xue, C., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. In: IEEE EUC Conference, pp. 25–34 (2006)
Jiang, Z., et al.: HPC AI500: a benchmark suite for HPC AI systems. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 10–22. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_2
Qiu, M., Khisamutdinov, E., et al.: RNA nanotechnology for computer design and in vivo computation. Philos. Trans. Royal Soc. A: Math. Phys. Eng. Sci. 371(2000), 20120310 (2013)
Yang, X., Wang, Z., et al.: Matcloud: a high-throughput computational infrastructure for integrated management of materials simulation, data and resources. Comput. Mater. Sci. 146, 319–333 (2018)
De Laurentiis, L., De Santis, D., et al.: A new user oriented platform to develop AI for the estimation of bio-geophysical parameters from EO data. In: IEEE International Geoscience and Remote Sensing Symposium, IGARSS, pp. 262–265 (2021)
Collins, R.A., Trauzzi, G., et al.: Meta-fish-lib: a generalised, dynamic DNA reference library pipeline for metabarcoding of fishes. J. Fish Biol. 99(4), 1446–1454 (2021)
Qiu, M., et al.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE Date, pp. 1–6 (2007)
Ahn, D.H., Garlick, J., Grondona, M., et al.: Flux: a next-generation resource management framework for large HPC centers. In: 43rd IEEE Conference on Parallel Processing Workshops, pp. 9–17 (2014)
Asatiani, A.: Why cloud?-a review of cloud adoption determinants in organizations. In: European Conference on Information Systems (2015)
Saha, P., Beltre, A., et al.: Evaluation of docker containers for scientific workloads in the cloud. In: Practice and Experience on Advanced Research Computing, pp. 1–8 (2018)
Qiu, M., Yang, L., et al.: Dynamic and leakage energy minimization with soft real-time loop scheduling and voltage assignment. IEEE TVLSI 18(3), 501–504 (2009)
Qiu, M., Jia, Z., Xue, C. et al. Voltage assignment with guaranteed probability satisfying timing constraint for real-time multiproceesor DSP. J VLSI Sign. Process. Syst. Sign Image Video Technol. 46, 55–73 (2007). https://doi.org/10.1007/s11265-006-0002-0
Li, J., Ming, Z., et al.: Resource allocation robustness in multi-core embedded systems with inaccurate information. J. Syst. Arch. 57(9), 840–849 (2011)
Cieslak, W.R., Westrich, H.R.: Ldrd impacts
Zhao, H., Chen, M., et al.: A novel pre-cache schema for high performance android system. FGCS 56, 766–772 (2016)
Gao, Y., et al.: Performance and power analysis of high-density multi-GPGPU architectures: a preliminary case study. In: IEEE 17th HPCC, pp. 29–35 (2015)
Gai, K., Qiu, M., Elnagdy, S.: A novel secure big data cyber incident analytics framework for cloud-based cybersecurity insurance. In: IEEE BigDataSecurity Conference (2016)
Dolezal, R., Sobeslav, V., Hornig, O., Balik, L., Korabecny, J., Kuca, K.: HPC cloud technologies for virtual screening in drug discovery. In: Nguyen, N.T., Trawiński, B., Kosala, R. (eds.) ACIIDS 2015. LNCS (LNAI), vol. 9012, pp. 440–449. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15705-4_43
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distributed system monitoring. JPDC 73(3), 330–340 (2013)
Li, G., Woo, J., Lim, S.B.: HPC cloud architecture to reduce HPC workflow complexity in containerized environments. Applied Sci. 11(3), 923 (2021)
Salvadore, F., Ponzini, R.: Lincosim: a web based HPC-cloud platform for automatic virtual towing tank analysis. J. Grid Comp. 17(4), 771–795 (2019)
Ma, Y., Yu, D., et al.: Paddlepaddle: an open-source deep learning platform from industrial practice. Front. Data Domputing 1(1), 105–115 (2019)
Yao, T., Wang, J., Wan, M., et al.: Venusai: an artificial intelligence platform for scientific discovery on supercomputers. J. Syst. Arch. 128, 102550 (2022)
Acknowledgments
This work was supported by the National Key R &D Program of China(No. 2020AAA0105202).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wan, M. et al. (2023). OpenVenus: An Open Service Interface for HPC Environment Based on SLURM. In: Qiu, M., Lu, Z., Zhang, C. (eds) Smart Computing and Communication. SmartCom 2022. Lecture Notes in Computer Science, vol 13828. Springer, Cham. https://doi.org/10.1007/978-3-031-28124-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-28124-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28123-5
Online ISBN: 978-3-031-28124-2
eBook Packages: Computer ScienceComputer Science (R0)