A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance

Liu, Xu; Ounifi, Hibat-Allah; Gherbi, Abdelouahed; Li, Wubin; Cheriet, Mohamed

doi:10.1007/s12652-019-01357-4

A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance

Original Research
Published: 13 June 2019

Volume 11, pages 2309–2323, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Xu Liu¹,
Hibat-Allah Ounifi²,
Abdelouahed Gherbi²,
Wubin Li³ &
…
Mohamed Cheriet¹

1474 Accesses
23 Citations
Explore all metrics

Abstract

The high-density computing requirements of machine learning (ML) is a challenging performance bottleneck. Limited by the sequential instruction execution system, traditional general purpose processors are not suitable for efficient ML. In this work, we present an ML system design methodology based on GPU and FPGA to tackle this problem. The core idea of our proposal is when designing an ML platform, we leverage the graphics processing unit (GPU)’s high-density computing to perform model training and exploit field programmable gate array (FPGA)’s low-latency to perform model inferencing. In between, we define a model converter, which enable transforming the model used by the training module to one that is used by inferencing module. We evaluated our approach through two use cases. The first is a handwritten digit recognition with convolutional neural network while the second use case is for predicting data center’s power usage effectiveness with deep neural network regression algorithm. The experimental results indicate that our solution can take advantages of GPU and FPGA’s parallel computing capacity to improve the efficiency of training and inferencing significantly. Meanwhile, the solution preserves the accuracy and the mean square error while converting the models between the different frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Data-Intensive Computing Acceleration with Python in Xilinx FPGA

Accelerating Convolutional Neural Networks in FPGA-based SoCs using a Soft-Core GPU

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 55–64
Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for Kernel SVM pedestrian detection. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 61–68
Bergstra J, Bastien F, Breuleux O, Lamblin P, Pascanu R, Delalleau O et al (2011) Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, Citeseer, vol 3
Bettoni M, Urgese G, Kobayashi Y, Macii E, Acquaviva A (2017) A convolutional neural network fully implemented on FPGA for embedded platforms. In: New generation of CAS (NGCAS), IEEE, pp 49–52
Chen C, Yao J, Zhang R, Zhou Y, Qin T, Zhan T, Wang Q (2019) MMdnn. GitHub repository. https://github.com/microsoft/MMdnn
David Wright (2017) Improving electrical efficiency in your data center. https://www.datacenterknowledge.com/archives/2014/09/23/improving-electrical-efficiency-data-center
Ganesh SS, Arulmozhivarman P, Tatavarti VSNR (2018) Prediction of pm2.5 using an ensemble of artificial neural networks and regression models. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0801-8
Article Google Scholar
Google (2018) The MNIST matrix. https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
Google (2019) TensorFlow. https://www.tensorflow.org/
Huang R, Feng W, Fan M, Guo Q, Sun J (2017) Learning multi-path cnn for mural deterioration detection. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0656-4
Article Google Scholar
Intel (2018) Intel OpenCL development. http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR029&All=1
Kind T (2018) Tensorflow (TF) benchmarks. https://github.com/tobigithub/tensorflow-deep-learning/wiki/tf-benchmarks
Lanfear T (2013) High performance computing with CUDA and Tesla hardware. https://intranet.birmingham.ac.uk/it/teams/infrastructure/research/bear/documents/public/CUDA-2013-07-31/CUDA-Tutorial.pdf
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Cortes C, Burges CJC (2018) The MNIST database. http://yann.lecun.com/exdb/mnist/
Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. J Emerg Technol Comput Syst 14(2):18:1–18:16. https://doi.org/10.1145/3154839
Article Google Scholar
Liu X, Ounifi HA, Gherbi A, Lemieux Y, Li W (2018) A hybrid gpu-FPGA-based computing platform for machine learning. Proc Comput Sci 141:104–111
Article Google Scholar
Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE, pp 575–580
Nagarajan K, Holland B, George AD, Slatton KC, Lam H (2011) Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J Signal Process Syst 62(1):43–63
Article Google Scholar
Ounifi HA, Liu X, Gherbi A, Lemieux Y, Li W (2018) Model-based approach to data center design and power usage effectiveness assessment. Proc Comput Sci 141:143–150
Article Google Scholar
Potluri S, Fasih A, Vutukuru LK, Al Machot F, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. In: 2011 joint 3rd Int’l workshop on nonlinear dynamics and synchronization (INDS) and 16th Int’l symposium on theoretical electrical engineering (ISTET), IEEE, pp 1–7
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 26–35
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 873–880
Rush A, Sirasao A, Ignatowski M (2017) Unified deep learning with cpu gpu and fpga technologies. In: Advanced Micro Devices, Tech. Rep
Sharp T (2008) Implementing decision trees and forests on a GPU. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 595–608
Google Scholar
Steinkraus D, Buck I, Y Simard P (2005) Using GPUs for machine learning algorithms. In: Proceedings of the 8th international conference on document analysis and recognition, IEEE, pp 1115–1120
Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) Dlau: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Design Integr Circ Syst 36(3):513–517
Google Scholar
Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors, pp 107–114
Zhu M, Liu L, Wang C, Xie Y (2016) Cnnlab: a novel parallel framework for neural networks using gpu and FPGA—a practical study with trade-off analysis. CoRR arXiv:1606.06234

Download references

Acknowledgements

This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Research Canada and the Canada Research Chair in Sustainable Smart Eco-Cloud. We would also like to thank Yves Lemieux for his insightful feedback during the research work.

Author information

Authors and Affiliations

Synchromedia Laboratory, University of Québec (ÉTS), Montréal, Canada
Xu Liu & Mohamed Cheriet
University of Québec (ÉTS), Montréal, Canada
Hibat-Allah Ounifi & Abdelouahed Gherbi
Ericsson Research, Ericsson, Montréal, QC, Canada
Wubin Li

Authors

Xu Liu
View author publications
You can also search for this author inPubMed Google Scholar
Hibat-Allah Ounifi
View author publications
You can also search for this author inPubMed Google Scholar
Abdelouahed Gherbi
View author publications
You can also search for this author inPubMed Google Scholar
Wubin Li
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Cheriet
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Abdelouahed Gherbi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Ounifi, HA., Gherbi, A. et al. A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance. J Ambient Intell Human Comput 11, 2309–2323 (2020). https://doi.org/10.1007/s12652-019-01357-4

Download citation

Received: 02 January 2019
Accepted: 05 May 2019
Published: 13 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12652-019-01357-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Data-Intensive Computing Acceleration with Python in Xilinx FPGA

Accelerating Convolutional Neural Networks in FPGA-based SoCs using a Soft-Core GPU

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now