Skip to main content

Distributed Training on a Highly Heterogeneous HPC System

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12471))

Included in the following conference series:

  • 1121 Accesses

Abstract

During the recent years HPC systems are being targeted as suitable systems to run DeepLearning workloads. In that respect, a number of machine learning libraries exist targeting different HPC computing platforms. In the context of the European DeepHealth project, the European Distributed Deep Learning library (EDDL) and the European Computing Vision library (ECVL) have been developed. These libraries target heterogeneous HPC systems including multi/many-core processors (CPUs), GPUs and FPGAs. In this paper we describe the approach followed within the project to exploit HPC resources in an efficient and transparent manner with special focus on FPGAs. The complete process is hidden from the end user perspective, allowing a simplification on the complexity to run DeepLearning workloads on heterogeneous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org

  2. Agosta, G., et al.: Challenges in deeply heterogeneous high performance systems. In: 22nd Euromicro Conference on Digital System Design, DSD 2019, Kallithea, Greece, 28–30 August 2019, pp. 428–435. IEEE (2019). https://doi.org/10.1109/DSD.2019.00068

  3. Badia, R.M., et al.: Comp superscalar, an interoperable programming framework. SoftwareX 3–4, 32–36 (2015)

    Article  Google Scholar 

  4. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  5. Flich, J., et al.: MANGO: exploring manycore architectures for next-generation HPC systems. In: Kubátová, H., Novotný, M., Skavhaug, A. (eds.) Euromicro Conference on Digital System Design, DSD 2017, Vienna, Austria, 30 August–1 September 2017, pp. 478–485. IEEE Computer Society (2017). https://doi.org/10.1109/DSD.2017.51

  6. DeepHealth Project: Deep-Learning and HPC to Boost Biomedical Applications for Health (2019). https://deephealth-project.eu/. Accessed 27 July 2020

Download references

Acknowledgment

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 825111.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Flich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Flich, J., Hernandez, C., Quiñones, E., Paredes, R. (2020). Distributed Training on a Highly Heterogeneous HPC System. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2020. Lecture Notes in Computer Science(), vol 12471. Springer, Cham. https://doi.org/10.1007/978-3-030-60939-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60939-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60938-2

  • Online ISBN: 978-3-030-60939-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics