Skip to main content

A Generic Neural Network Implementation on GPU and Its Performance Benchmark

  • Conference paper
  • First Online:
  • 529 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 561))

Abstract

Due to the parallel and computationally intensive nature of Artificial Neural Networks, we use GPUs to implement a generic Multilayer Perceptron (MLP) framework and compare the speed to an implementation on the CPU. The speedup achieved increases as the size of the network increases, but is also contingent on the hardware used. Three GPUs are tested, the Tesla K80, the Tesla T4, and the Tesla P100. For the largest ANNs tested, speedups ranged from 331.14\(\times \) for the K80 up to 2379.2\(\times \) on the P100.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abraham, A.: Artificial neural networks. In: Syden-ham, P., Thorn, R. (eds.) Handbook of Measuring System Design. John Wiley and Sons Ltd., London, pp. 901–908 (2005)

    Google Scholar 

  2. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)

  3. Beckingsale, D.A., et al.: Portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)

    Google Scholar 

  4. Cao, G., Balasubramanian, N., Balasubramanian. A.: MobiRNN: efficient recurrent neural network execution on mobile GPU. In: Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications, pp. 1–6 (2017)

    Google Scholar 

  5. Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Université de Rennes 1, Suvisoft. https://www.suvisoft.com

  6. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)

    Google Scholar 

  7. Dematté, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform. 11(3), 323–333 (2010)

    Article  Google Scholar 

  8. Dogaru, R., Dogaru, I.: Optimization of gpu and cpu acceleration for neural networks layers implemented in python. In: 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE), pp. 1–6 (2017)

    Google Scholar 

  9. Dolhansky, B.: Artificial neural networks: Matrix form (Part 5), December 2014. https://www.briandolhansky.com/blog/2014/10/30/artificial-neural-networks-matrix-form-part-5

  10. Fernando, R.: Reducing the Cost of Vertex Transfer, Chapter 28.3.2. Addison-Wesley (2004)

    Google Scholar 

  11. Guzhva, A., Dolenko, S., Persiantsev, I.: Multifold acceleration of neural network computations using GPU. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 373–380. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_39

    Chapter  Google Scholar 

  12. Hassoun, M.H.: et al.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)

    Google Scholar 

  13. Huqqani, A.A., Schikuta, E., Ye, S., Chen, P.: Multicore and GPU parallelization of neural networks for face recognition. Procedia Comput. Sci. 18, 349–358 (2013)

    Google Scholar 

  14. Salar, S., Oskouei, L., Golestani, H., Hashemi, M., Ghiasi, S.: CNNdroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1201–1205 (2016)

    Google Scholar 

  15. Lee, J., et al.: On-device neural net inference with mobile GPUs. arXiv preprint arXiv:1907.01989 (2019)

  16. Li, B., et al.: Large scale recurrent neural network on GPU. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 4062–4069 (2014)

    Google Scholar 

  17. Li, Y., Liu, Z., Kai, X., Hao, Yu., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 14(2), 1–16 (2018)

    Article  Google Scholar 

  18. Ma, Y., Rusu, F., Torres, M.: Stochastic gradient descent on modern hardware: Multi-core CPU or GPU? synchronous or asynchronous? In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1063–1072. IEEE (2019)

    Google Scholar 

  19. Nugteren, C.: Tutorial: Opencl sgemm tuning for kepler (2014). https://cnugteren.github.io/tutorial/pages/page1.html

  20. Kyoung-Su, O., Jung, K.: GPU implementation of neural networks. Pattern Recogn. 37(6), 1311–1314 (2004)

    Article  Google Scholar 

  21. Pallipuram, V.K., Bhuiyan, M., Smith, M.C.: A comparative study of GPU programming models and architectures using neural networks. J. Supercomput. 61(3), 673–718 (2012)

    Google Scholar 

  22. Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317–324 (2010)

    Google Scholar 

  23. Vouzis, P.D., Sahinidis, N.V.: GPU-blast: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)

    Google Scholar 

  24. Yegnanarayana, B.: Artificial Neural Networks. PHI learning Pvt. Ltd. (2009)

    Google Scholar 

  25. Zhang, S., Gunupudi, P., Zhang. Q-.J.: Parallel back-propagation neural network training technique using CUDA on multiple GPUs. In: 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), pp. 1–3. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tristan Udby .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Udby, T., Tian, Y. (2023). A Generic Neural Network Implementation on GPU and Its Performance Benchmark. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_9

Download citation

Publish with us

Policies and ethics