A Generic Neural Network Implementation on GPU and Its Performance Benchmark

Udby, Tristan; Tian, Yun

doi:10.1007/978-3-031-18344-7_9

A Generic Neural Network Implementation on GPU and Its Performance Benchmark

Tristan Udby¹⁰ &
Yun Tian¹⁰

Conference paper
First Online: 14 October 2022

529 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 561))

Abstract

Due to the parallel and computationally intensive nature of Artificial Neural Networks, we use GPUs to implement a generic Multilayer Perceptron (MLP) framework and compare the speed to an implementation on the CPU. The speedup achieved increases as the size of the network increases, but is also contingent on the hardware used. Three GPUs are tested, the Tesla K80, the Tesla T4, and the Tesla P100. For the largest ANNs tested, speedups ranged from 331.14\(\times \) for the K80 up to 2379.2\(\times \) on the P100.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abraham, A.: Artificial neural networks. In: Syden-ham, P., Thorn, R. (eds.) Handbook of Measuring System Design. John Wiley and Sons Ltd., London, pp. 901–908 (2005)
Google Scholar
Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)
Beckingsale, D.A., et al.: Portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)
Google Scholar
Cao, G., Balasubramanian, N., Balasubramanian. A.: MobiRNN: efficient recurrent neural network execution on mobile GPU. In: Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications, pp. 1–6 (2017)
Google Scholar
Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Université de Rennes 1, Suvisoft. https://www.suvisoft.com
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Dematté, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform. 11(3), 323–333 (2010)
Article Google Scholar
Dogaru, R., Dogaru, I.: Optimization of gpu and cpu acceleration for neural networks layers implemented in python. In: 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE), pp. 1–6 (2017)
Google Scholar
Dolhansky, B.: Artificial neural networks: Matrix form (Part 5), December 2014. https://www.briandolhansky.com/blog/2014/10/30/artificial-neural-networks-matrix-form-part-5
Fernando, R.: Reducing the Cost of Vertex Transfer, Chapter 28.3.2. Addison-Wesley (2004)
Google Scholar
Guzhva, A., Dolenko, S., Persiantsev, I.: Multifold acceleration of neural network computations using GPU. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 373–380. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_39
Chapter Google Scholar
Hassoun, M.H.: et al.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)
Google Scholar
Huqqani, A.A., Schikuta, E., Ye, S., Chen, P.: Multicore and GPU parallelization of neural networks for face recognition. Procedia Comput. Sci. 18, 349–358 (2013)
Google Scholar
Salar, S., Oskouei, L., Golestani, H., Hashemi, M., Ghiasi, S.: CNNdroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1201–1205 (2016)
Google Scholar
Lee, J., et al.: On-device neural net inference with mobile GPUs. arXiv preprint arXiv:1907.01989 (2019)
Li, B., et al.: Large scale recurrent neural network on GPU. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 4062–4069 (2014)
Google Scholar
Li, Y., Liu, Z., Kai, X., Hao, Yu., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 14(2), 1–16 (2018)
Article Google Scholar
Ma, Y., Rusu, F., Torres, M.: Stochastic gradient descent on modern hardware: Multi-core CPU or GPU? synchronous or asynchronous? In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1063–1072. IEEE (2019)
Google Scholar
Nugteren, C.: Tutorial: Opencl sgemm tuning for kepler (2014). https://cnugteren.github.io/tutorial/pages/page1.html
Kyoung-Su, O., Jung, K.: GPU implementation of neural networks. Pattern Recogn. 37(6), 1311–1314 (2004)
Article Google Scholar
Pallipuram, V.K., Bhuiyan, M., Smith, M.C.: A comparative study of GPU programming models and architectures using neural networks. J. Supercomput. 61(3), 673–718 (2012)
Google Scholar
Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317–324 (2010)
Google Scholar
Vouzis, P.D., Sahinidis, N.V.: GPU-blast: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Google Scholar
Yegnanarayana, B.: Artificial Neural Networks. PHI learning Pvt. Ltd. (2009)
Google Scholar
Zhang, S., Gunupudi, P., Zhang. Q-.J.: Parallel back-propagation neural network training technique using CUDA on multiple GPUs. In: 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), pp. 1–3. IEEE (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Eastern Washington University, Cheney, WA, 99004, USA
Tristan Udby & Yun Tian

Authors

Tristan Udby
View author publications
You can also search for this author in PubMed Google Scholar
Yun Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tristan Udby .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Udby, T., Tian, Y. (2023). A Generic Neural Network Implementation on GPU and Its Performance Benchmark. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-18344-7_9
Published: 14 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18343-0
Online ISBN: 978-3-031-18344-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics