Abstract
The convergence of HPC and Big Data along with the influence of Cloud are playing an important role in the democratization of HPC. The increasing needs of Data Analytics in computational power has added new fields of interest for the HPC facilities but also new problematics such as interoperability with Cloud and ease of use. Besides the typical HPC applications, these infrastructures are now asked to handle more complex workflows combining Machine Learning, Big Data and HPC. This brings challenges on the resource management, scheduling and environment deployment layers. Hence, enhancements are needed to allow multiple frameworks to be deployed under common system management while providing the right abstraction to facilitate adoption.
This paper presents the architecture adopted for the parallel and distributed execution management software stack of Cybele EU funded project which is put in place on production HPC centers to execute hybrid data analytics workflows in the context of precision agriculture and livestock farming applications. The design is based on: Kubernetes as a higher level orchestrator of Big Data components, hybrid workflows and a common interface to submit HPC or Big Data jobs; Slurm or Torque for HPC resource management; and Singularity containerization platform for the dynamic deployment of the different Data Analytics frameworks on HPC. The paper showcases precision agriculture workflows being executed upon the architecture and provides some initial performance evaluation results and insights for the whole prototype design.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
References
ETP4HPC. Strategic research agenda (SRA4) for HPC in Europe, March 2020. https://www.etp4hpc.eu/pujades/files/ETP4HPC_SRA4_2020_web(1).pdf
Perakis, K., Lampathaki, F., Nikas, K., Georgiou, Y., Marko, O., Maselyne, J.: CYBELE - fostering precision agriculture & livestock farming through secure access to large-scale HPC enabled virtual industrial experimentation environments fostering scalable big data analytics. Comput. Netw. 168, 107035 (2020). ISSN 1389–1286
Zhou, N., Georgiou, Y., Zhong, L., Zhou, H., Pospieszny, M.: Container orchestration on HPC systems. In: IEEE CLOUD (2020, to appear)
Casalicchio, E.: Container orchestration: a survey. In: Puliafito, A., Trivedi, K.S. (eds.) Systems Modeling: Methodologies and Tools. EICC, pp. 221–235. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-92378-9_14
Hightower, K., Burns, B., Beda, J.: Kubernetes: Up and Running Dive into the Future of Infrastructure, 1st edn. OReilly Media (2017)
Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240 (2013)
Plauth, M., Feinbube, L., Polze, A.: A performance survey of lightweight virtualization techniques. In: De Paoli, F., Schulte, S., Broch Johnsen, E. (eds.) ESOCC 2017. LNCS, vol. 10465, pp. 34–48. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67262-5_3
Zhang, J., Lu, X., Panda, D.K.: Is singularity-based container technology ready for running MPI applications on HPC clouds? In: Proceedings of The10th International Conference on Utility and Cloud Computing, Association for Computing Machinery (2017)
Mercier, M., Glesser, D., Georgiou, Y., Richard, O.: Big data and HPC collocation: using HPC idle resources for Big Data analytics. In: BigData, pp. 347–352 (2017)
Spark - Kubernetes integration. https://spark.apache.org/docs/latest/running-on-kubernetes.html
Boettiger, C.: An introduction to Docker for reproducible research. In: ACM SIGOPS Operating Systems Review (2015)
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3
Godlove, D.: Singularity: simple, secure containers for compute-driven workloads. PEARC 24(1–24), 4 (2019)
Muscianisi, G., Fiameni, G., Azab, A.: Singularity GPU containers execution on HPC cluster. In: ISC Workshops, pp. 61–68 (2019)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement NO. 825355.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Georgiou, Y. et al. (2020). Converging HPC, Big Data and Cloud Technologies for Precision Agriculture Data Analytics on Supercomputers. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12321. Springer, Cham. https://doi.org/10.1007/978-3-030-59851-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-59851-8_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59850-1
Online ISBN: 978-3-030-59851-8
eBook Packages: Computer ScienceComputer Science (R0)