Skip to main content

Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Abstract

The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Except for the first epoch, which includes the cuDNN configuration time. For this reason, we start timing from the second epoch.

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/

  2. Aicheler, M., et al.: A multi-TeV linear collider based on CLIC technology: CLIC conceptual design report. CERN Yellow Reports: Monographs, CERN, Geneva (2012). https://doi.org/10.5170/CERN-2012-007. https://cds.cern.ch/record/1500095

  3. Bird, I.: Workshop introduction, context of the workshop: half-way through run2; preparing for run3, run4. WLCG Workshop (2016)

    Google Scholar 

  4. Carminati, F., Khattak, G., Pierini, M., Vallecor-safa, S., Farbin, A.: Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics. In: NIPS (2017)

    Google Scholar 

  5. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  6. Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30218-6_19

    Chapter  Google Scholar 

  7. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  8. Goodfellow, I.J.: On distinguishability criteria for estimating generative models. arXiv preprint arXiv:1412.6515 (2014)

  9. Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)

    Article  Google Scholar 

  10. Khattak, G., Vallecorsa, S., Carminati, F.: Three dimensional energy parametrized generative adversarial networks for electromagnetic shower simulation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3913–3917, October 2018

    Google Scholar 

  11. Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014)

  12. Kurth, T., Smorkalov, M., Mendygral, P., Sridharan, S., Mathuriya, A.: Tensorflow at scale: performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML. Concurr. Comput.: Pract. Exp. (2018). https://doi.org/10.1002/cpe.4989

  13. Le, T.D., Imai, H., Negishi, Y., Kawachiya, K.: TFLMS: large model support in TensorFlow by graph rewriting. arXiv preprint arXiv:1807.02037 (2018)

  14. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? arXiv preprint arXiv:1801.04406 (2018)

  15. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2642–2651. JMLR.org (2017)

    Google Scholar 

  16. de Oliveira, L., Paganini, M., Nachman, B.: Learning particle physics by example: location-aware generative adversarial networks for physics synthesis. Comput. Softw. Big Sci. 1(1), 4 (2017)

    Article  Google Scholar 

  17. Paganini, M., de Oliveira, L., Nachman, B.: CaloGAN: simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks. Phys. Rev. D 97(1), 014021 (2018)

    Article  Google Scholar 

  18. Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)

    Article  Google Scholar 

  19. Quintero, D.: IBM POWER8 high-performance computing guide: IBM power system S822LC (8335-GTB) edition. IBM Corporation, International Technical Support Organization, Poughkeepsie, NY (2017)

    Google Scholar 

  20. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  21. Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)

  22. Vallecorsa, S.: Generative models for fast simulation. J. Phys.: Conf. Ser. 1085, 022005 (2018). https://doi.org/10.1088/1742-6596/1085/2/022005

    Article  Google Scholar 

  23. Vallecorsa, S., Moise, D., Carminati, F., Khattak, G.R.: Data-parallel training of generative adversarial networks on HPC systems for HEP simulations. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 162–171, December 2018. https://doi.org/10.1109/HiPC.2018.00026

  24. Vallecorsa, S., et al.: Distributed training of generative adversarial networks for fast detector simulation. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 487–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_35

    Chapter  Google Scholar 

  25. Wale, D.: IBM PowerAI : Deep Learning Unleashed on IBM Power Systems Servers. IBM Redbooks, S.l (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Hesam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hesam, A., Vallecorsa, S., Khattak, G., Carminati, F. (2019). Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics