A comparative study of GPU programming models and architectures using neural networks

Pallipuram, Vivek K.; Bhuiyan, Mohammad; Smith, Melissa C.

doi:10.1007/s11227-011-0631-3

A comparative study of GPU programming models and architectures using neural networks

Published: 31 May 2011

Volume 61, pages 673–718, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Vivek K. Pallipuram¹,
Mohammad Bhuiyan¹ &
Melissa C. Smith¹

550 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

Recently, General Purpose Graphical Processing Units (GP-GPUs) have been identified as an intriguing technology to accelerate numerous data-parallel algorithms. Several GPU architectures and programming models are beginning to emerge and establish their niche in the High-Performance Computing (HPC) community. New massively parallel architectures such as the Nvidia’s Fermi and AMD/ATi’s Radeon pack tremendous computing power in their large number of multiprocessors. Their performance is unleashed using one of the two GP-GPU programming models: Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL). Both of them offer constructs and features that have direct bearing on the application runtime performance. In this paper, we compare the two GP-GPU architectures and the two programming models using a two-level character recognition network. The two-level network is developed using four different Spiking Neural Network (SNN) models, each with different ratios of computation-to-communication requirements. To compare the architectures, we have chosen the two extremes of the SNN models for implementation of the aforementioned two-level network. An architectural performance comparison of the SNN application running on Nvidia’s Fermi and AMD/ATi’s Radeon is done using the OpenCL programming model exhausting all of the optimization strategies plausible for the two architectures. To compare the programming models, we implement the two-level network on Nvidia’s Tesla C2050 based on the Fermi architecture. We present a hierarchy of implementations, where we successively add optimization techniques associated with the two programming models. We then compare the two programming models at these different levels of implementation and also present the effect of the network size (problem size) on the performance. We report significant application speed-up, as high as 1095× for the most computation intensive SNN neuron model, against a serial implementation on the Intel Core 2 Quad host. A comprehensive study presented in this paper establishes connections between programming models, architectures and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive review of Binary Neural Network

Article 30 March 2023

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

A Reconfigurable Hardware Architecture for Principal Component Analysis

Article 11 October 2018

References

Intel’s teraflops chip uses mesh architecture to emulate mainframe. http://www.eetimes.com/electronics-products/processors/4091586/Intel-s-teraflops-chip-uses-mesh-architecture-to-emulate-mainframe
Tilera’s homepage. http://www.tilera.com/products/processors
NVIDIA CUDA programming guide. http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf
Ligowski L, Rudnicki W (2009) An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In: Proceedings of IPDPS 2009, Rome, Italy, May 2009
Google Scholar
Phillips JC, Stone JE, Schulten K (2008) Adapting a message-driven parallel application to GPU-accelerated clusters. In: Proceedings of SC 2008, Austin, TX, November 2008
Google Scholar
OpenCL-open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/
Izhikevich E (2004) Which model to use for cortical spiking neurons? IEEE Trans Neural Netw 15(5):1063–1070
Article Google Scholar
Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572
Article MathSciNet Google Scholar
Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200:375–388
Article Google Scholar
Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193–213
Article Google Scholar
Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544
Google Scholar
Bhuiyan MA, Pallipuram, VK, Smith MC (2010) Acceleration of spiking neural networks in emerging multi-core and GPU architectures. In: HiCOMB 2010, a workshop in IPDPS 2010, Atlanta, GA, April 2010
Google Scholar
Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proc. IJCNN, pp. 53–58, August 2007
Google Scholar
Technical Brief: NVIDIA GeForce 8800 GPU architecture overview. www.nvidia.com
NVIDIA’s next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf
ATI Mobility Radeon HD 5870 GPU specifications. http://www.amd.com/us/products/notebook/graphics/ati-mobility-hd-5800/Pages/hd-5870-specs.aspx
NVIDIA CUDA C programming best practices guide. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf
NVIDIA OpenCL programming guide. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf
Du P, Weber R, Tomov S, Peterson G, Dongarra J (2010) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. Technical Report CS-10-656, Electrical Engineering and Computer Science Department, University of Tennessee, 2010. LAPACK Working note 228
Pallipuram VK (2010) Acceleration of spiking neural networks on single-GPU and multi-GPU systems. Master’s thesis, May 2010
Johansson C, Lansner A (2007) Towards cortex sized artificial neural systems. Neural Netw 20(1), 48–61
Article MATH Google Scholar
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-100) (No. CUCS-006-96): Columbia Automated Vision Environment
Ananthanarayanan R, Esser SK, Simon HD, Modha DS (2009) The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses. In: Proceedings of SC ’09, Portland, Oregon, November 2009
Google Scholar
Rall W (1959 Branching dendritic trees and motoneuron membrane resistivity. Exp Neurol 1, 503–532
Article Google Scholar
Nageswaran JM, Dutt N, Krichmar JL, Nicolau A, Veidenbauma AV (2009) A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neural Netw 22(5–6), 791–800. Special issue
Article Google Scholar
Khanna G., McKennon J. (2010) Numerical modeling of gravitational wave sources accelerated by OpenCL. Comput Phys Commun 181(9), 1605–1611
Article MATH Google Scholar
Karimi K, Dickson NG, Hamze F (2010) A performance comparison of CUDA and OpenCL. The Computing Research Repository (CoRR), arXiv:1005.2581
Bhuiyan MA, Taha TM, Jalasutram R (2009) Character recognition with two spiking neural network models on multi-core architectures. In: Proceedings of IEEE symposium on CIMSVP, Nashville, TN, March 2009, pp. 29–34
Google Scholar
ATI stream computing OpenCL. http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf
CUDA visual profiler release notes. http://developer.download.nvidia.com/compute/cuda/3_0/sdk/docs/OpenCL_release_notes.txt
ATI stream profiler. http://developer.amd.com/gpu/StreamProfiler/Pages/default.aspx
Stream KernelAnalyzer. http://developer.amd.com/gpu/ska/pages/default.aspx

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, 29634, USA
Vivek K. Pallipuram, Mohammad Bhuiyan & Melissa C. Smith

Authors

Vivek K. Pallipuram
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Bhuiyan
View author publications
You can also search for this author in PubMed Google Scholar
Melissa C. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Melissa C. Smith.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pallipuram, V.K., Bhuiyan, M. & Smith, M.C. A comparative study of GPU programming models and architectures using neural networks. J Supercomput 61, 673–718 (2012). https://doi.org/10.1007/s11227-011-0631-3

Download citation

Published: 31 May 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11227-011-0631-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative study of GPU programming models and architectures using neural networks

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A survey of the recent architectures of deep convolutional neural networks

A Reconfigurable Hardware Architecture for Principal Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative study of GPU programming models and architectures using neural networks

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A survey of the recent architectures of deep convolutional neural networks

A Reconfigurable Hardware Architecture for Principal Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation