Abstract
Neural network (NN) systems are widely used in many important applications ranging from computer vision to speech recognition. To date, most NN systems are processed by general processing units like CPUs or GPUs. However, as the sizes of dataset and network rapidly increase, the original software implementations suffer from long training time. To overcome this problem, specialized hardware accelerators are needed to design high-speed NN systems. This article presents an efficient hardware architecture of restricted Boltzmann machine (RBM) that is an important category of NN systems. Various optimization approaches at the hardware level are performed to improve the training speed. As-soon-as-possible and overlapped-scheduling approaches are used to reduce the latency. It is shown that, compared with the flat design, the proposed RBM architecture can achieve 50% reduction in training time. In addition, an on-the-fly computation scheme is also used to reduce the storage requirement of binary and stochastic states by several hundreds of times. Then, based on the proposed approach, a 784-2252 RBM design example is developed for MNIST handwritten digit recognition dataset. Analysis shows that the VLSI design of RBM achieves significant improvement in training speed and energy efficiency as compared to CPU/GPU-based solution.
- H. Amin, K. Curtis, and B. Hayes-Gill. 1997. Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proc. Circ. Devices Syst. 144, 6, 313--317. Google ScholarCross Ref
- L. Camunas-Mesa, A. Acosta-Jimenez, C. Zamarrefio-Ramos, T. Serrano-Gotarredona, and B. Linares-Barranco. 2011. A 32x32 pixel convolution processor chip for address event vision sensors with 155 ns event latency and 20 meps throughput. IEEE Trans. Circ. Syst. I: Regular Papers 58, 4, 777--790. Google ScholarCross Ref
- S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual ACM International. Symposium on Computer Architecture, 247--257. Google ScholarDigital Library
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 269--284. Google ScholarDigital Library
- C. Cox and E. Blanz. 1992. Ganglion -- A fast field-programmable gate array implementation of a connectionist classifier. IEEE J. Solid-State Circ. 28, 3, 288--299. Google ScholarCross Ref
- C. Farabet, B. Martini, P. Akselrod, S. Talay, Y. LeCun, and E. Culurciello. 2010. Hardware accelerated convolutional neural networks for synthetic vision systems. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 257--260. Google ScholarCross Ref
- G. E. Hinton, S. Osindero, and Y. Teh. 2006. A fast learning algorithm for deep belief nets. Neur. Computat. 18, 1527--1554. Google ScholarDigital Library
- G. E. Hinton. 2010. A practical guide to training restricted Boltzmann machines. http://www.cs.toronto.edu/∼hinton/absps/guideTR.pdf.Google Scholar
- S. Jung and S. Kim, 2007. Hardware implementation of a real-time neural network controller with a DSP and an FPGA for nonlinear systems. IEEE Trans. Indust. Elect. 54, 265--271. Google ScholarCross Ref
- L. W. Kim, S. Asaad, and R. Linsker. 2014. A fully pipelined FPGA architecture of a factored restricted Boltzmann machine artificial neural network. ACM Trans. Reconfig. Technol. Syst., 7, 1, 1--23. Google ScholarDigital Library
- S. Kim, P. McMahon, and K. Olukotun. 2010. A large-scale architecture for restricted Boltzmann machines. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 201--208. Google ScholarDigital Library
- D. Larkin, A. Kinane, V. Muresan, and N. O’Connor. 2006. An efficient hardware architecture for a neural network activation function generator. In Proceedings of the ISNN International Symposium on Neural Networks, 144, 1319--1327. Google ScholarDigital Library
- D. Ly and P. Chow. 2009. A high-performance FPGA architecture for restricted Boltzmann machines. In Proceedings of the ACMISIGDA International Symposium on Field Programmable Gate Arrays, 73--82. Google ScholarDigital Library
- Y. Maeda and T. Tada. 2003. FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation. IEEE Trans. Neural Netw. 14, 688--695. Google ScholarDigital Library
- A. R. Mohamed, G. Dahl, and G. E. Hinton. 2009. Deep belief networks for phone recognition. In Proceedings of NIPS 22 Workshop on Deep Learning for Speech Recognition.Google Scholar
- K. K. Parhi. 1999. VLSI Digital Signal Processing Systems: Design and Implementation. Wiley, New York.Google Scholar
- K. K. Parhi and D. G. Messerschmitt. 1991. Static rate-optimal scheduling of iterative data flow programs via optimum unfolding. IEEE Trans. Comput. 40, 2, 178--195. Google ScholarDigital Library
- Y. Teh and G. E. Hinton. 2001. Rate-coded restricted Boltzmann machines for face recognition. In Advances in Neural Information Processing Systems, MIT Press, Cambridge, 908--914.Google Scholar
- J. Zhu and P. Sutton. 2003. FPGA implementations of neural networks: A survey of a decade of progress. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, 1062--1066. Google ScholarCross Ref
Index Terms
- VLSI Architectures for the Restricted Boltzmann Machine
Recommendations
Survey of Deep Learning Paradigms for Speech Processing
AbstractOver the past decades, a particular focus is given to research on machine learning techniques for speech processing applications. However, in the past few years, research has focused on using deep learning for speech processing applications. This ...
GPU-accelerated restricted boltzmann machine for collaborative filtering
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part ICollaborative Filtering (CF) is an important technique for recommendation systems which model and analyzes the preferences of customers for giving reasonable advices. Recently, many applications based on Restricted Boltzmann Machine (RBM) have been ...
A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network
Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on ...
Comments