ABSTRACT
Machine learning is being used in more and more artificial intelligence applications. While existing machine learning frameworks mostly support NVIDIA CUDA GPUs, there has been little research dedicated to targeting other devices through open standards such as OpenCL. In this paper, we explain how machine learning applications can harness the power of OpenCL using open standards and how, by using SYCL, TensorFlow can be extended to include customized operations running on OpenCL devices.
- Martín Abadi and others. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA. Google ScholarDigital Library
- Rami Al-Rfou and others. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google Scholar
- Anelia Angelova, Alex Krizhevsky, and Vincent Vanhoucke. 2015. Pedestrian detection with a large-field-of-view deep network. In Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 704--711.Google ScholarCross Ref
- Tianqi Chen and others. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google Scholar
- Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google Scholar
- Ronan Collobert and others. 2002. Torch: a modular machine learning software library. Technical Report. Idiap.Google Scholar
- Mehdi Goli. 2016. VisionCPP: A SYCL-based Computer Vision Framework. In Proceedings of the 4th International Workshop on OpenCL. ACM, 6. Google ScholarDigital Library
- Mehdi Goli. 2017. SYCL backend for Eigen. Technical peresentation in 1st workshop on Distributed and Heterogeneous Programming in C and C++(DHPCC++17)- To be appear in May 2017 (2017).Google Scholar
- Khronos OpenCL Working Group. 2008. The OpenCL Specification. (2008).Google Scholar
- OpenACC Working Group and others. 2011. The OpenACC Application Programming Interface. (2011).Google Scholar
- Junli Gu, Yibing Liu, Yuan Gao, and Maohua Zhu. 2016. OpenCL caffe: Accelerating and enabling a cross platform machine learning framework. In Proceedings of the 4th International Workshop on OpenCL. ACM, 8. Google ScholarDigital Library
- Gael Guennebaud, Benoit Jacob, and others. 2014. Eigen: a C++ linear algebra library. (2014).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Yangqing Jia and others. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
- Norman P. Jouppi and others. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. (2017).Google Scholar
- Khronos Group 2014. The SPIR Specification. Khronos Group.Google Scholar
- Microsoft. 2013. C++ AMP: Language and Programming Model. (2013).Google Scholar
- CUDA Nvidia. 2010. Programming guide. (2010).Google Scholar
- ARB OpenMP. 2011. OpenMP Application Programming Interface. (2011).Google Scholar
- Ralph Potter, Paul Keir, Russell J Bradford, and Alastair Murray. 2015. Kernel composition in SYCL. In Proceedings of the 3rd International Workshop on OpenCL. ACM, 11. Google ScholarDigital Library
- Olga Russakovsky and others. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarDigital Library
- Ben Sander, Greg Stoner, Siu-chi Chan, WH Chung, and Robin Maffeo. 2015. HCC: A C++ Compiler For Heterogeneous Computing. HSA Foundation, Tech. Rep. (2015).Google Scholar
- Khronos OpenCL Working Group SYCL subgroup. 2015. SYCL Specification. (2015).Google Scholar
Index Terms
- Accelerated Machine Learning Using TensorFlow and SYCL on OpenCL Devices
Recommendations
A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware
IWOCL '22: Proceedings of the 10th International Workshop on OpenCLIn scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational ...
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN
IWOCL '19: Proceedings of the International Workshop on OpenCLOver the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' ...
TensorFlow Acceleration on ARM Hikey Board
IWOCL '18: Proceedings of the International Workshop on OpenCLThere is huge demand for targeting complex and large-scale machine learning applications particularly those based on popular actively-maintained frameworks such as TensorFlow and CAFFE to a variety of platforms with accelerators ranging from high-end ...
Comments