Abstract
Recently, General Purpose Graphical Processing Units (GP-GPUs) have been identified as an intriguing technology to accelerate numerous data-parallel algorithms. Several GPU architectures and programming models are beginning to emerge and establish their niche in the High-Performance Computing (HPC) community. New massively parallel architectures such as the Nvidia’s Fermi and AMD/ATi’s Radeon pack tremendous computing power in their large number of multiprocessors. Their performance is unleashed using one of the two GP-GPU programming models: Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL). Both of them offer constructs and features that have direct bearing on the application runtime performance. In this paper, we compare the two GP-GPU architectures and the two programming models using a two-level character recognition network. The two-level network is developed using four different Spiking Neural Network (SNN) models, each with different ratios of computation-to-communication requirements. To compare the architectures, we have chosen the two extremes of the SNN models for implementation of the aforementioned two-level network. An architectural performance comparison of the SNN application running on Nvidia’s Fermi and AMD/ATi’s Radeon is done using the OpenCL programming model exhausting all of the optimization strategies plausible for the two architectures. To compare the programming models, we implement the two-level network on Nvidia’s Tesla C2050 based on the Fermi architecture. We present a hierarchy of implementations, where we successively add optimization techniques associated with the two programming models. We then compare the two programming models at these different levels of implementation and also present the effect of the network size (problem size) on the performance. We report significant application speed-up, as high as 1095× for the most computation intensive SNN neuron model, against a serial implementation on the Intel Core 2 Quad host. A comprehensive study presented in this paper establishes connections between programming models, architectures and applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Intel’s teraflops chip uses mesh architecture to emulate mainframe. http://www.eetimes.com/electronics-products/processors/4091586/Intel-s-teraflops-chip-uses-mesh-architecture-to-emulate-mainframe
Tilera’s homepage. http://www.tilera.com/products/processors
NVIDIA CUDA programming guide. http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf
Ligowski L, Rudnicki W (2009) An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In: Proceedings of IPDPS 2009, Rome, Italy, May 2009
Phillips JC, Stone JE, Schulten K (2008) Adapting a message-driven parallel application to GPU-accelerated clusters. In: Proceedings of SC 2008, Austin, TX, November 2008
OpenCL-open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/
Izhikevich E (2004) Which model to use for cortical spiking neurons? IEEE Trans Neural Netw 15(5):1063–1070
Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(6):1569–1572
Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200:375–388
Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193–213
Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544
Bhuiyan MA, Pallipuram, VK, Smith MC (2010) Acceleration of spiking neural networks in emerging multi-core and GPU architectures. In: HiCOMB 2010, a workshop in IPDPS 2010, Atlanta, GA, April 2010
Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proc. IJCNN, pp. 53–58, August 2007
Technical Brief: NVIDIA GeForce 8800 GPU architecture overview. www.nvidia.com
NVIDIA’s next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf
ATI Mobility Radeon HD 5870 GPU specifications. http://www.amd.com/us/products/notebook/graphics/ati-mobility-hd-5800/Pages/hd-5870-specs.aspx
NVIDIA CUDA C programming best practices guide. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf
NVIDIA OpenCL programming guide. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf
Du P, Weber R, Tomov S, Peterson G, Dongarra J (2010) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. Technical Report CS-10-656, Electrical Engineering and Computer Science Department, University of Tennessee, 2010. LAPACK Working note 228
Pallipuram VK (2010) Acceleration of spiking neural networks on single-GPU and multi-GPU systems. Master’s thesis, May 2010
Johansson C, Lansner A (2007) Towards cortex sized artificial neural systems. Neural Netw 20(1), 48–61
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-100) (No. CUCS-006-96): Columbia Automated Vision Environment
Ananthanarayanan R, Esser SK, Simon HD, Modha DS (2009) The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses. In: Proceedings of SC ’09, Portland, Oregon, November 2009
Rall W (1959 Branching dendritic trees and motoneuron membrane resistivity. Exp Neurol 1, 503–532
Nageswaran JM, Dutt N, Krichmar JL, Nicolau A, Veidenbauma AV (2009) A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neural Netw 22(5–6), 791–800. Special issue
Khanna G., McKennon J. (2010) Numerical modeling of gravitational wave sources accelerated by OpenCL. Comput Phys Commun 181(9), 1605–1611
Karimi K, Dickson NG, Hamze F (2010) A performance comparison of CUDA and OpenCL. The Computing Research Repository (CoRR), arXiv:1005.2581
Bhuiyan MA, Taha TM, Jalasutram R (2009) Character recognition with two spiking neural network models on multi-core architectures. In: Proceedings of IEEE symposium on CIMSVP, Nashville, TN, March 2009, pp. 29–34
ATI stream computing OpenCL. http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf
CUDA visual profiler release notes. http://developer.download.nvidia.com/compute/cuda/3_0/sdk/docs/OpenCL_release_notes.txt
ATI stream profiler. http://developer.amd.com/gpu/StreamProfiler/Pages/default.aspx
Stream KernelAnalyzer. http://developer.amd.com/gpu/ska/pages/default.aspx
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pallipuram, V.K., Bhuiyan, M. & Smith, M.C. A comparative study of GPU programming models and architectures using neural networks. J Supercomput 61, 673–718 (2012). https://doi.org/10.1007/s11227-011-0631-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0631-3