research-article

HAL: Computer System for Scalable Deep Learning

Authors:
Volodymyr Kindratenko

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Dawei Mu

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Yan Zhan

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
John Maloney

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Sayed Hadi Hashemi

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Benjamin Rabe

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Ke Xu

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Roy Campbell

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
Jian Peng

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

,
William Gropp

National Center for Supercomputing Applications, UIUC, USA

National Center for Supercomputing Applications, UIUC, USA
View Profile

PEARC '20: Practice and Experience in Advanced Research ComputingJuly 2020Pages 41–48https://doi.org/10.1145/3311790.3396649

Published:26 July 2020Publication History

PEARC '20: Practice and Experience in Advanced Research Computing

Pages 41–48

ABSTRACT

We describe the design, deployment and operation of a computer system built to efficiently run deep learning frameworks. The system consists of 16 IBM POWER9 servers with 4 NVIDIA V100 GPUs each, interconnected with Mellanox EDR InfiniBand fabric, and a DDN all-flash storage array. The system is tailored towards efficient execution of the IBM Watson Machine Learning enterprise software stack that combines popular open-source deep learning frameworks. We build a custom management software stack to enable an efficient use of the system by a diverse community of users and provide guides and recipes for running deep learning workloads at scale utilizing all available GPUs. We demonstrate scaling of a PyTorch and TensorFlow based deep neural networks to produce state-of-the-art performance results.

Supplemental Material

3311790.3396649.mp4

mp4

248 MB

Download

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google Scholar
Alexandre Bicas Caldeira. 2018. IBM power system AC922 introduction and technical overview. IBM Corporation, International Technical Support Organization.Google Scholar
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677(2017).Google Scholar
Colin Graber and Alexander Schwing. 2019. Graph Structured Prediction Energy Networks. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8690–8701. http://papers.nips.cc/paper/9074-graph-structured-prediction-energy-networks.pdfGoogle Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Dave Hudak, Doug Johnson, Alan Chalker, Jeremy Nicklas, Eric Franz, Trey Dockendorf, and Brian McMichael. 2018. Open OnDemand: A web-based client portal for HPC centers. Journal of Open Source Software 3, 25 (2018), 622. https://doi.org/10.21105/joss.00622Google ScholarCross Ref
Gregory M Kurtzer, Vanessa Sochat, and Michael W Bauer. 2017. Singularity: Scientific containers for mobility of compute. PloS one 12, 5 (2017).Google Scholar
J. Lin, U. Jain, and A. G. Schwing. 2019. TAB-VCR: Tags and Attributes based VCR Baselines. In Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024–8035.Google Scholar
Dino Quintero, Bing He, Bruno C Faria, Alfonso Jara, Chris Parsons, Shota Tsukamoto, Richard Wale, 2019. IBM PowerAI: Deep Learning Unleashed on IBM Power Systems Servers. IBM Redbooks.Google Scholar
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799(2018).Google Scholar
Wei Wei and E.A. Huerta. 2020. Gravitational wave denoising of binary black hole mergers with deep learning. Physics Letters B 800(2020), 135081. https://doi.org/10.1016/j.physletb.2019.135081Google ScholarCross Ref
Yang You, Zhao Zhang, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. 2017. Imagenet training in 24 minutes. arXiv preprint arXiv:1709.05011(2017).Google Scholar

Recommendations

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit exploited GPUs to accelerate the training process. This has been primarily achieved by aggressive improvements in parallel hardware as well as through ...
Read More
Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Deep neural networks (DNNs) is one of the most popular machine learning methods and is widely used in many modern applications. The training process of DNNs is a time-consuming process. Accelerating the training of DNNs has been the focus of many ...
Read More
Deep reinforcement learning in computer vision: a comprehensive survey
Abstract
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PEARC '20: Practice and Experience in Advanced Research Computing
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster architecture
deep learning
high-performance computing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate133of202submissions,66%
Upcoming Conference
PEARC '24

Sponsor:

sighpc

sighpc

PEARC '24: Practice and Experience in Advanced Research Computing

July 21 - 25, 2024

Providence , RI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 270
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

HAL: Computer System for Scalable Deep Learning

PEARC '20: Practice and Experience in Advanced Research Computing

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Deep reinforcement learning in computer vision: a comprehensive survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

HAL: Computer System for Scalable Deep Learning

PEARC '20: Practice and Experience in Advanced Research Computing

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Deep reinforcement learning in computer vision: a comprehensive survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media