skip to main content
research-article

VLSI Architectures for the Restricted Boltzmann Machine

Published:12 May 2017Publication History
Skip Abstract Section

Abstract

Neural network (NN) systems are widely used in many important applications ranging from computer vision to speech recognition. To date, most NN systems are processed by general processing units like CPUs or GPUs. However, as the sizes of dataset and network rapidly increase, the original software implementations suffer from long training time. To overcome this problem, specialized hardware accelerators are needed to design high-speed NN systems. This article presents an efficient hardware architecture of restricted Boltzmann machine (RBM) that is an important category of NN systems. Various optimization approaches at the hardware level are performed to improve the training speed. As-soon-as-possible and overlapped-scheduling approaches are used to reduce the latency. It is shown that, compared with the flat design, the proposed RBM architecture can achieve 50% reduction in training time. In addition, an on-the-fly computation scheme is also used to reduce the storage requirement of binary and stochastic states by several hundreds of times. Then, based on the proposed approach, a 784-2252 RBM design example is developed for MNIST handwritten digit recognition dataset. Analysis shows that the VLSI design of RBM achieves significant improvement in training speed and energy efficiency as compared to CPU/GPU-based solution.

References

  1. H. Amin, K. Curtis, and B. Hayes-Gill. 1997. Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proc. Circ. Devices Syst. 144, 6, 313--317. Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Camunas-Mesa, A. Acosta-Jimenez, C. Zamarrefio-Ramos, T. Serrano-Gotarredona, and B. Linares-Barranco. 2011. A 32x32 pixel convolution processor chip for address event vision sensors with 155 ns event latency and 20 meps throughput. IEEE Trans. Circ. Syst. I: Regular Papers 58, 4, 777--790. Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual ACM International. Symposium on Computer Architecture, 247--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cox and E. Blanz. 1992. Ganglion -- A fast field-programmable gate array implementation of a connectionist classifier. IEEE J. Solid-State Circ. 28, 3, 288--299. Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Farabet, B. Martini, P. Akselrod, S. Talay, Y. LeCun, and E. Culurciello. 2010. Hardware accelerated convolutional neural networks for synthetic vision systems. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 257--260. Google ScholarGoogle ScholarCross RefCross Ref
  7. G. E. Hinton, S. Osindero, and Y. Teh. 2006. A fast learning algorithm for deep belief nets. Neur. Computat. 18, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. E. Hinton. 2010. A practical guide to training restricted Boltzmann machines. http://www.cs.toronto.edu/∼hinton/absps/guideTR.pdf.Google ScholarGoogle Scholar
  9. S. Jung and S. Kim, 2007. Hardware implementation of a real-time neural network controller with a DSP and an FPGA for nonlinear systems. IEEE Trans. Indust. Elect. 54, 265--271. Google ScholarGoogle ScholarCross RefCross Ref
  10. L. W. Kim, S. Asaad, and R. Linsker. 2014. A fully pipelined FPGA architecture of a factored restricted Boltzmann machine artificial neural network. ACM Trans. Reconfig. Technol. Syst., 7, 1, 1--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kim, P. McMahon, and K. Olukotun. 2010. A large-scale architecture for restricted Boltzmann machines. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Larkin, A. Kinane, V. Muresan, and N. O’Connor. 2006. An efficient hardware architecture for a neural network activation function generator. In Proceedings of the ISNN International Symposium on Neural Networks, 144, 1319--1327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Ly and P. Chow. 2009. A high-performance FPGA architecture for restricted Boltzmann machines. In Proceedings of the ACMISIGDA International Symposium on Field Programmable Gate Arrays, 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Maeda and T. Tada. 2003. FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation. IEEE Trans. Neural Netw. 14, 688--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. R. Mohamed, G. Dahl, and G. E. Hinton. 2009. Deep belief networks for phone recognition. In Proceedings of NIPS 22 Workshop on Deep Learning for Speech Recognition.Google ScholarGoogle Scholar
  16. K. K. Parhi. 1999. VLSI Digital Signal Processing Systems: Design and Implementation. Wiley, New York.Google ScholarGoogle Scholar
  17. K. K. Parhi and D. G. Messerschmitt. 1991. Static rate-optimal scheduling of iterative data flow programs via optimum unfolding. IEEE Trans. Comput. 40, 2, 178--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Teh and G. E. Hinton. 2001. Rate-coded restricted Boltzmann machines for face recognition. In Advances in Neural Information Processing Systems, MIT Press, Cambridge, 908--914.Google ScholarGoogle Scholar
  19. J. Zhu and P. Sutton. 2003. FPGA implementations of neural networks: A survey of a decade of progress. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, 1062--1066. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. VLSI Architectures for the Restricted Boltzmann Machine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 3
      Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems
      July 2017
      418 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3051701
      • Editor:
      • Yuan Xie
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 May 2017
      • Accepted: 1 October 2016
      • Revised: 1 September 2016
      • Received: 1 March 2016
      Published in jetc Volume 13, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader