Skip to main content
Log in

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In recent years, convolutional neural networks (CNNs) as important parts of deep neural networks (DNNs) have achieved great successes in the field of computer vision. However, Convolution always takes much computation time in the DNNs. In order to improve the efficiency of CNNs, many solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, different from traditional GPU-based algorithms, a novel algorithm based on look-up table is proposed to speed up the CNNs with small filters by applying GPU. By transforming complex matrix multiplications operations in the convolution computation to some table-based simple summation operations, the overhead of convolution computation can be considerably reduced. The process of creating a table and looking up values in the table is very appropriate for parallelization on a GPU. The experimental results show that the proposed approach can improve the speed of convolution computation by 20–30 %, compared with existing state-of-the-art works with less accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.

  2. Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., & Graf, H. P. (2009). A massively parallel coprocessor for convolutional neural networks. In Proceedings of 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09) (pp. 53–60).

  3. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1097–1105).

  4. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., & Ng, A. (2012). Large scale distributed deep networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 1223–1231).

  5. Szegedy, C., Liu, W., Jia, Y., Semanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going Deeper with Convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15) (pp. 1–9).

  6. Simonyan, K., & Zisserman A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  7. Deng, C. W., Huang, G. B., Xu, J., & Tang, J. X. (2015). Extreme learning machines: new trends and applications. Science China Information Sciences, 58(2), 1–16.

    Article  Google Scholar 

  8. Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.

    Article  Google Scholar 

  9. Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multi-core embedded systems with inaccurate information. Journal of System Architecture, 57(9), 840–849.

    Article  Google Scholar 

  10. Krizhevsky, A. (2014). Cudaconvnet2, available in https://code.google.com/p/cuda-convnet2/

  11. Jia, Y., Shelhamer, E., Donahue, J., Karayev, Long, S., J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (Multimedia’14) (pp. 675–678).

  12. Mathieu, M., Henaff, M., & LeCun, Y. (2013). Fast training of convolutional networks through FFTs. arXiv preprint arXiv:1312.5851.

  13. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. (2013). Predicting parameters in deep learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13) (pp. 2148–2156).

  14. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of Advances in Neural Information Processing Systems (NIPS’14) (pp. 1269–1277).

  15. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  16. Peled, A., & Liu, B. (1974). A new hardware realization of digital filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(6), 456–462.

    Article  Google Scholar 

  17. Krizhevsky, A., Nair, V., & Hinton, G. (2009). CIFAR-10 and CIFAR-100 datasets, available in http://www.cs.toronto.edu/~kriz/cifar.html.

  18. Li, F., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under grant No.61133008, National High-tech Research and Development Program of China (863 Program) under grant No.2012AA010905, and Scientific Research Foundation of Ministry of Education of China-China Mobile under grant No.MCM20122041. We gratefully acknowledge the support of NVIDIA Corporation with the excellent GPU Titan Z used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbin Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, W., Chen, Y., Jin, H. et al. A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters. J Sign Process Syst 86, 313–325 (2017). https://doi.org/10.1007/s11265-016-1129-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-016-1129-2

Keywords

Navigation