Abstract
During the recent years HPC systems are being targeted as suitable systems to run DeepLearning workloads. In that respect, a number of machine learning libraries exist targeting different HPC computing platforms. In the context of the European DeepHealth project, the European Distributed Deep Learning library (EDDL) and the European Computing Vision library (ECVL) have been developed. These libraries target heterogeneous HPC systems including multi/many-core processors (CPUs), GPUs and FPGAs. In this paper we describe the approach followed within the project to exploit HPC resources in an efficient and transparent manner with special focus on FPGAs. The complete process is hidden from the end user perspective, allowing a simplification on the complexity to run DeepLearning workloads on heterogeneous systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org
Agosta, G., et al.: Challenges in deeply heterogeneous high performance systems. In: 22nd Euromicro Conference on Digital System Design, DSD 2019, Kallithea, Greece, 28–30 August 2019, pp. 428–435. IEEE (2019). https://doi.org/10.1109/DSD.2019.00068
Badia, R.M., et al.: Comp superscalar, an interoperable programming framework. SoftwareX 3–4, 32–36 (2015)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Flich, J., et al.: MANGO: exploring manycore architectures for next-generation HPC systems. In: Kubátová, H., Novotný, M., Skavhaug, A. (eds.) Euromicro Conference on Digital System Design, DSD 2017, Vienna, Austria, 30 August–1 September 2017, pp. 478–485. IEEE Computer Society (2017). https://doi.org/10.1109/DSD.2017.51
DeepHealth Project: Deep-Learning and HPC to Boost Biomedical Applications for Health (2019). https://deephealth-project.eu/. Accessed 27 July 2020
Acknowledgment
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 825111.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Flich, J., Hernandez, C., Quiñones, E., Paredes, R. (2020). Distributed Training on a Highly Heterogeneous HPC System. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2020. Lecture Notes in Computer Science(), vol 12471. Springer, Cham. https://doi.org/10.1007/978-3-030-60939-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-60939-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60938-2
Online ISBN: 978-3-030-60939-9
eBook Packages: Computer ScienceComputer Science (R0)