Abstract
The high-density computing requirements of machine learning (ML) is a challenging performance bottleneck. Limited by the sequential instruction execution system, traditional general purpose processors are not suitable for efficient ML. In this work, we present an ML system design methodology based on GPU and FPGA to tackle this problem. The core idea of our proposal is when designing an ML platform, we leverage the graphics processing unit (GPU)’s high-density computing to perform model training and exploit field programmable gate array (FPGA)’s low-latency to perform model inferencing. In between, we define a model converter, which enable transforming the model used by the training module to one that is used by inferencing module. We evaluated our approach through two use cases. The first is a handwritten digit recognition with convolutional neural network while the second use case is for predicting data center’s power usage effectiveness with deep neural network regression algorithm. The experimental results indicate that our solution can take advantages of GPU and FPGA’s parallel computing capacity to improve the efficiency of training and inferencing significantly. Meanwhile, the solution preserves the accuracy and the mean square error while converting the models between the different frameworks.













Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 55–64
Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for Kernel SVM pedestrian detection. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 61–68
Bergstra J, Bastien F, Breuleux O, Lamblin P, Pascanu R, Delalleau O et al (2011) Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, Citeseer, vol 3
Bettoni M, Urgese G, Kobayashi Y, Macii E, Acquaviva A (2017) A convolutional neural network fully implemented on FPGA for embedded platforms. In: New generation of CAS (NGCAS), IEEE, pp 49–52
Chen C, Yao J, Zhang R, Zhou Y, Qin T, Zhan T, Wang Q (2019) MMdnn. GitHub repository. https://github.com/microsoft/MMdnn
David Wright (2017) Improving electrical efficiency in your data center. https://www.datacenterknowledge.com/archives/2014/09/23/improving-electrical-efficiency-data-center
Ganesh SS, Arulmozhivarman P, Tatavarti VSNR (2018) Prediction of pm2.5 using an ensemble of artificial neural networks and regression models. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0801-8
Google (2018) The MNIST matrix. https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
Google (2019) TensorFlow. https://www.tensorflow.org/
Huang R, Feng W, Fan M, Guo Q, Sun J (2017) Learning multi-path cnn for mural deterioration detection. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0656-4
Intel (2018) Intel OpenCL development. http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR029&All=1
Kind T (2018) Tensorflow (TF) benchmarks. https://github.com/tobigithub/tensorflow-deep-learning/wiki/tf-benchmarks
Lanfear T (2013) High performance computing with CUDA and Tesla hardware. https://intranet.birmingham.ac.uk/it/teams/infrastructure/research/bear/documents/public/CUDA-2013-07-31/CUDA-Tutorial.pdf
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Cortes C, Burges CJC (2018) The MNIST database. http://yann.lecun.com/exdb/mnist/
Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. J Emerg Technol Comput Syst 14(2):18:1–18:16. https://doi.org/10.1145/3154839
Liu X, Ounifi HA, Gherbi A, Lemieux Y, Li W (2018) A hybrid gpu-FPGA-based computing platform for machine learning. Proc Comput Sci 141:104–111
Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE, pp 575–580
Nagarajan K, Holland B, George AD, Slatton KC, Lam H (2011) Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J Signal Process Syst 62(1):43–63
Ounifi HA, Liu X, Gherbi A, Lemieux Y, Li W (2018) Model-based approach to data center design and power usage effectiveness assessment. Proc Comput Sci 141:143–150
Potluri S, Fasih A, Vutukuru LK, Al Machot F, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. In: 2011 joint 3rd Int’l workshop on nonlinear dynamics and synchronization (INDS) and 16th Int’l symposium on theoretical electrical engineering (ISTET), IEEE, pp 1–7
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 26–35
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 873–880
Rush A, Sirasao A, Ignatowski M (2017) Unified deep learning with cpu gpu and fpga technologies. In: Advanced Micro Devices, Tech. Rep
Sharp T (2008) Implementing decision trees and forests on a GPU. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 595–608
Steinkraus D, Buck I, Y Simard P (2005) Using GPUs for machine learning algorithms. In: Proceedings of the 8th international conference on document analysis and recognition, IEEE, pp 1115–1120
Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) Dlau: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Design Integr Circ Syst 36(3):513–517
Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors, pp 107–114
Zhu M, Liu L, Wang C, Xie Y (2016) Cnnlab: a novel parallel framework for neural networks using gpu and FPGA—a practical study with trade-off analysis. CoRR arXiv:1606.06234
Acknowledgements
This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Research Canada and the Canada Research Chair in Sustainable Smart Eco-Cloud. We would also like to thank Yves Lemieux for his insightful feedback during the research work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, X., Ounifi, HA., Gherbi, A. et al. A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance. J Ambient Intell Human Comput 11, 2309–2323 (2020). https://doi.org/10.1007/s12652-019-01357-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-019-01357-4