Skip to main content
Log in

A comparative study of GPU programming models and architectures using neural networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recently, General Purpose Graphical Processing Units (GP-GPUs) have been identified as an intriguing technology to accelerate numerous data-parallel algorithms. Several GPU architectures and programming models are beginning to emerge and establish their niche in the High-Performance Computing (HPC) community. New massively parallel architectures such as the Nvidia’s Fermi and AMD/ATi’s Radeon pack tremendous computing power in their large number of multiprocessors. Their performance is unleashed using one of the two GP-GPU programming models: Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL). Both of them offer constructs and features that have direct bearing on the application runtime performance. In this paper, we compare the two GP-GPU architectures and the two programming models using a two-level character recognition network. The two-level network is developed using four different Spiking Neural Network (SNN) models, each with different ratios of computation-to-communication requirements. To compare the architectures, we have chosen the two extremes of the SNN models for implementation of the aforementioned two-level network. An architectural performance comparison of the SNN application running on Nvidia’s Fermi and AMD/ATi’s Radeon is done using the OpenCL programming model exhausting all of the optimization strategies plausible for the two architectures. To compare the programming models, we implement the two-level network on Nvidia’s Tesla C2050 based on the Fermi architecture. We present a hierarchy of implementations, where we successively add optimization techniques associated with the two programming models. We then compare the two programming models at these different levels of implementation and also present the effect of the network size (problem size) on the performance. We report significant application speed-up, as high as 1095× for the most computation intensive SNN neuron model, against a serial implementation on the Intel Core 2 Quad host. A comprehensive study presented in this paper establishes connections between programming models, architectures and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Intel’s teraflops chip uses mesh architecture to emulate mainframe. http://www.eetimes.com/electronics-products/processors/4091586/Intel-s-teraflops-chip-uses-mesh-architecture-to-emulate-mainframe

  2. Tilera’s homepage. http://www.tilera.com/products/processors

  3. NVIDIA CUDA programming guide. http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf

  4. Ligowski L, Rudnicki W (2009) An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In: Proceedings of IPDPS 2009, Rome, Italy, May 2009

    Google Scholar 

  5. Phillips JC, Stone JE, Schulten K (2008) Adapting a message-driven parallel application to GPU-accelerated clusters. In: Proceedings of SC 2008, Austin, TX, November 2008

    Google Scholar 

  6. OpenCL-open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/

  7. Izhikevich E (2004) Which model to use for cortical spiking neurons? IEEE Trans Neural Netw 15(5):1063–1070

    Article  Google Scholar 

  8. Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572

    Article  MathSciNet  Google Scholar 

  9. Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200:375–388

    Article  Google Scholar 

  10. Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193–213

    Article  Google Scholar 

  11. Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544

    Google Scholar 

  12. Bhuiyan MA, Pallipuram, VK, Smith MC (2010) Acceleration of spiking neural networks in emerging multi-core and GPU architectures. In: HiCOMB 2010, a workshop in IPDPS 2010, Atlanta, GA, April 2010

    Google Scholar 

  13. Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proc. IJCNN, pp. 53–58, August 2007

    Google Scholar 

  14. Technical Brief: NVIDIA GeForce 8800 GPU architecture overview. www.nvidia.com

  15. NVIDIA’s next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf

  16. ATI Mobility Radeon HD 5870 GPU specifications. http://www.amd.com/us/products/notebook/graphics/ati-mobility-hd-5800/Pages/hd-5870-specs.aspx

  17. NVIDIA CUDA C programming best practices guide. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf

  18. NVIDIA OpenCL programming guide. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf

  19. Du P, Weber R, Tomov S, Peterson G, Dongarra J (2010) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. Technical Report CS-10-656, Electrical Engineering and Computer Science Department, University of Tennessee, 2010. LAPACK Working note 228

  20. Pallipuram VK (2010) Acceleration of spiking neural networks on single-GPU and multi-GPU systems. Master’s thesis, May 2010

  21. Johansson C, Lansner A (2007) Towards cortex sized artificial neural systems. Neural Netw 20(1), 48–61

    Article  MATH  Google Scholar 

  22. Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-100) (No. CUCS-006-96): Columbia Automated Vision Environment

  23. Ananthanarayanan R, Esser SK, Simon HD, Modha DS (2009) The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses. In: Proceedings of SC ’09, Portland, Oregon, November 2009

    Google Scholar 

  24. Rall W (1959 Branching dendritic trees and motoneuron membrane resistivity. Exp Neurol 1, 503–532

    Article  Google Scholar 

  25. Nageswaran JM, Dutt N, Krichmar JL, Nicolau A, Veidenbauma AV (2009) A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neural Netw 22(5–6), 791–800. Special issue

    Article  Google Scholar 

  26. Khanna G., McKennon J. (2010) Numerical modeling of gravitational wave sources accelerated by OpenCL. Comput Phys Commun 181(9), 1605–1611

    Article  MATH  Google Scholar 

  27. Karimi K, Dickson NG, Hamze F (2010) A performance comparison of CUDA and OpenCL. The Computing Research Repository (CoRR), arXiv:1005.2581

  28. Bhuiyan MA, Taha TM, Jalasutram R (2009) Character recognition with two spiking neural network models on multi-core architectures. In: Proceedings of IEEE symposium on CIMSVP, Nashville, TN, March 2009, pp. 29–34

    Google Scholar 

  29. ATI stream computing OpenCL. http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf

  30. CUDA visual profiler release notes. http://developer.download.nvidia.com/compute/cuda/3_0/sdk/docs/OpenCL_release_notes.txt

  31. ATI stream profiler. http://developer.amd.com/gpu/StreamProfiler/Pages/default.aspx

  32. Stream KernelAnalyzer. http://developer.amd.com/gpu/ska/pages/default.aspx

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Melissa C. Smith.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pallipuram, V.K., Bhuiyan, M. & Smith, M.C. A comparative study of GPU programming models and architectures using neural networks. J Supercomput 61, 673–718 (2012). https://doi.org/10.1007/s11227-011-0631-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0631-3

Keywords

Navigation