Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel® FPGAs

Jiang, Chao; Ojika, Dave; Kurth, Thorsten; Prabhat; Vallecorsa, Sofia; Patel, Bhavesh; Lam, Herman

doi:10.1007/978-3-030-34356-9_44

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

International Conference on High Performance Computing

6238 Accesses
4 Citations

Abstract

AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis applications. However, traditional CPU-based sequential computing can no longer meet the requirements of mission-critical applications, which are compute-intensive and require low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with accelerators such as GPUs and FPGAs, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC$^{1}$ at the University of Florida, NERSC$^{2}$ at Lawrence Berkeley National Lab, CERN Openlab, Dell EMC, and Intel are studying the application of heterogeneous computing (HGC) to scientific problems using DNN models. This paper focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow. We present case studies and results in inferencing state-of-the-art DNN models for scientific data analysis, using Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup from 3$\times $ to 6$\times $ for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HPC AI500: A Benchmark Suite for HPC AI Systems

Integrating Deep Learning in Domain Sciences at Exascale

Computing infrastructure construction and optimization for high-performance computing and artificial intelligence

Article 19 October 2021

References

Dell EMC AI challenge. https://insidehpc.com/aichallenge
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org
Abdelfattah, M.S., et al.: DLA: compiler and FPGA overlay for neural network inference acceleration. arXiv e-prints arXiv:1807.06434, July 2018
Agostinelli, S., et al.: GEANT4: a simulation toolkit. Nucl. Instrum. Meth. A506, 250–303 (2003). https://doi.org/10.1016/S0168-9002(03)01368-8
Article Google Scholar
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Dai, J., et al.: BigDL: a distributed deep learning framework for big data. arXiv e-prints arXiv:1804.05839, April 2018
de Favereau, J., et al.: DELPHES 3: a modular framework for fast simulation of a generic collider experiment. J. High Energy Phys. 2014, 57 (2014). https://doi.org/10.1007/JHEP02(2014)057
Article Google Scholar
DeePhi: Deephi dnndk. http://www.deephi.com/technology/dnndk
Duarte, J., et al.: Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 13(7), P07027 (2018). https://doi.org/10.1088/1748-0221/13/07/P07027
Article Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. ArXiv e-prints, March 2016
Google Scholar
Carminati, F., Khattak, G., Vallecorsa, S.: 3D convolutional GAN for fast simulation. Presented at the 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication
Google Scholar
Hahnloser, R.H.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000). https://doi.org/10.1038/35016072
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. arXiv e-prints arXiv:1502.01852, February 2015
Intel: Openvino toolkit. https://software.intel.com/en-us/openvino-toolkit
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv e-prints arXiv:1502.03167, February 2015
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv e-prints arXiv:1412.6980, December 2014
Kurth, T., et al.: Deep learning at 15PF: supervised and semi-supervised classification for scientific data. arXiv e-prints arXiv:1708.05256, August 2017
Kurth, T.: Hep-cnn github repository. https://github.com/NERSC/hep_cnn_benchmark.git
Lebrun, P., et al.: The CLIC programme: towards a staged e+e$-$ linear collider exploring the terascale : CLIC conceptual design report (2012). https://doi.org/10.5170/CERN-2012-005
Mustafa, M., Bard, D., Bhimji, W., Lukić, Z., Al-Rfou, R., Kratochvil, J.: CosmoGAN: creating high-fidelity weak lensing convergence maps using generative adversarial networks. arXiv e-prints arXiv:1706.02390, June 2017
Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA 17 (2017). https://doi.org/10.1145/3020078.3021740
Sjöstrand, T., Mrenna, S., Skands, P.: A brief introduction to PYTHIA 8.1. Comput. Phys. Commun. 178(11), 852–867 (2008). https://doi.org/10.1016/j.cpc.2008.01.036. http://www.sciencedirect.com/science/article/pii/S0010465508000441
Article MATH Google Scholar
Wang, D., An, J., Xu, K.: PipeCNN: an OpenCL-based FPGA accelerator for large-scale convolution neuron networks. arXiv e-prints arXiv:1611.02450, November 2016
Wikipedia: Wikipedia pseudorapidity. https://en.wikipedia.org/wiki/Pseudorapidity

Download references

Acknowledgement

This research is funded in part by the NSF SHREC Center and the National Science Foundation (NSF) through its IUCRC Program under Grant No. CNS-1738420; and by NSF CISE Research Infrastructure (CRI) Program Grant No. 1405790.

Author information

Authors and Affiliations

SHREC: NSF Center for Space, High-Performance and Resilient Computing, Gainesville, USA
Chao Jiang, Dave Ojika & Herman Lam
National Energy Research Scientific Computing Center (NERSC), Berkeley, USA
Thorsten Kurth & Prabhat
CERN openlab, Geneva, Switzerland
Sofia Vallecorsa
Dell EMC, Hopkinton, USA
Bhavesh Patel

Authors

Chao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dave Ojika
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Kurth
View author publications
You can also search for this author in PubMed Google Scholar
Prabhat
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Vallecorsa
View author publications
You can also search for this author in PubMed Google Scholar
Bhavesh Patel
View author publications
You can also search for this author in PubMed Google Scholar
Herman Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Jiang .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, C. et al. (2019). Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel^® FPGAs. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_44
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics