Abstract
This paper introduces the porting of an industrial neural network simulator onto GPUs used in a tool-chain to sort massive amounts of E-mails and other textual data. Compared to other previous work, all steps are being executed on the GPU, achieving overall up to 33× speedup without using any cuBLAS functionality. All the time-consuming routines have been ported onto the GPU, i.e. the training-, the simulation- and the verification-phases, the training being the most time-consuming. It is planned to include these GPU-kernels into the product for special costumer’s demands.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
van Amesfoort, A.S., Varbanescu, A.L., Sips, H.J., van Nieuwpoort, R.V.: Evaluating multi-core platforms for hpc data-intensive kernels. In: CF 2009: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 207–216. ACM, New York (2009)
Dongarra, J.: Basic linear algebra subprograms technical forum standard. Int. J. of High Performance Applications and Supercomputing 16(1), 1–111 (2002)
Flynn, M.J.: Some computer organizations and their effectiveness. IEEE Trans. Comput. C-21, 948 (1972)
Göddeke, D., Strzodka, R., Turek, S.: Accelerating double precision FEM simulations with GPUs. In: Hülsemann, F., Kowarschik, M., Rüde, U. (eds.) Frontiers in Simulation, pp. 139–144 (2005)
Han, Y., Chakraborty, K., Roy, S., Kuntamukkala, V.: Design and implementation of a throughput-optimized gpu floorplanning algorithm. ACM Trans. Des. Autom. Electron. Syst. 16, 1–23 (2011)
NVIDIA: CUDA basic linear algebra subroutines (cuBLAS), http://developer.nvidia.com/cublas (2011)
NVIDIA: CUDA C programming guide version 4.0. Tech. rep. (2011), http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/docs/CUDA_C_Programming_Guide.pdf
Scanzio, S., Cumani, S., Gemello, R., Mana, F., Laface, P.: Parallel implementation of artificial neural network training. In: IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), March 14-19, vol. 1, pp. 4902–4905 (2010)
Siek, J., Lee, L.Q., Lumsdaine, A.: The Boost Graph Library. Addison-Wesley (2002)
Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pisa, Italy, February 17-19 (2010)
Takizawa, H., Chida, T., Kobayashi, H.: Evaluating computational performance of backpropagation learning on graphics hardware. Electr. Notes Theor. Comput. Sci. 225, 379–389 (2009)
Zhu, W.: A study of parallel evolution strategy: pattern search on a gpu computing platform. In: Proceedings of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation, GEC 2009, pp. 765–772. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wafai, M.A., Ahmed, Z., Keller, R., Holzmann, S., Sander, B., Resch, M. (2014). Optimization of Industrial Neural Network Simulators for GPGPUs. In: Chiu, D.K.W., Wang, M., Popescu, E., Li, Q., Lau, R. (eds) New Horizons in Web Based Learning. ICWL 2012. Lecture Notes in Computer Science, vol 7697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43454-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-43454-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43453-6
Online ISBN: 978-3-662-43454-3
eBook Packages: Computer ScienceComputer Science (R0)