Contrastive divergence for memristor-based restricted Boltzmann machine
Introduction
Memristors have initiated a new direction for the advancement of neuromorphic and analog applications. Biologically inspired computation is appealing and recently many methods have been researched extensively. Artificial neural network, being one of them, has been getting more attention due to deep learning and its hardware implementation in neuromorphic devices. The pioneering work (Snider, 2007) presented neural networks using memristors as synapses which emphasized easily manufactured and large scale devices. Demonstration of an associative memory using memristors was first presented in Pershin and Di Ventra (2010) while in Pershin and Ventra (2014) use of memcapacitors as synapses with integrate and fire neurons is presented. In many later works, memristors have been employed to implement synapses in neural networks (Thomas, 2013, Indiveri et al., 2013). Spike-Timing-Dependent-Plasticity (STDP) using memristors as synapses which can be used to emulate the auditory system is discussed in Serrano-Gotarredona et al. (2013). Memristors have also been used in image processing to detect edges (Prodromakis and Toumazou, 2010).
Restricted Boltzmann machine (RBM) used in deep networks has shown promising results in general, while the best results were achieved within the image classification problem (Larochelle and Bengio, 2008). RBMs in deep networks are trained in an unsupervised fashion using contrastive divergence (CD) as a learning algorithm. Since training large-scale RBMs is time consuming, some fast hardware implementations have been suggested in the literature e.g. FPGA-based RBM implementations have been suggested in Ly and Chow (2010), Kim et al. (2010) and recently in Kim et al. (2014). RBM with continuous valued neurons called Continuous RBM has VLSI implementation suggested in Chen et al. (2006). An implementation of RBM using digital synapses has been presented in Merolla et al. (2011), but no such implementation of RBMs exists for memristor based synapses.
This paper presents an approach to implementing CD in one layer of RBM which uses memristors as weight edges (synapse). Conductance of memristors is used to emulate a positive real number storage memory, which can be read and written using digital pulses. Furthermore, given that the conductance of an electronic device cannot be negative, a mechanism needs to be developed so that memristors can store negative real numbers as well. Additionally, since memristors are very noisy devices, it is important to use a realistic model of memristor with noise to better understand how learning behavior is affected under such conditions. This work presents such a mechanism, which makes little adjustments to RBMs and CD to keep the architecture simple resulting in an extensible system.
This paper is organized as follows. Section 2 introduces the background of the RBM followed by description of CD, and the resistive random-access memory (RRAM) memristors. Section 3 describes in detail the proposed RBM architecture and CD for memristor-based RBM. Section 4 presents the simulation setup and the obtained results. Finally, Section 5 concludes the paper.
Section snippets
Background
Boltzmann machines (BMs) are bidirectionally connected networks of stochastic processing units, which can be interpreted and trained as neural networks. In general, BMs are difficult and time consuming to train. Imposing certain restrictions on the network structure of BMs makes it an RBM, which are easier and less time consuming to train. From the perspective of neural networks, RBM is a generative stochastic neural network which maximizes the log probability of its training set. Its structure
Memristor-based RBM
Generally, weights in RBMs are real numbers ranging from negative infinity to positive infinity, although it is known that this limit is never reached. One crucial aspect for inducing learning in neural networks is its weight initialization. The weights must be randomly initialized, preferably in a specific range. The following subsection presents an implementation of CD on a memristor-based RBM, followed by the details of the weight initialization algorithm.
Simulation and results
For verification of the presented architecture, an RBM was trained and tested for a standard character recognition task on MNIST dataset. MNIST dataset is composed of 60,000 training images and 10,000 testing images. Memristors used for this work go from an LRS to the HRS in around 200 pulses, which effectively dictates the learning rate epsilon of (11). Therefore, instead of using the entire MNIST training dataset, only 10,000 training images were chosen randomly, with equal number of training
Conclusions
This work presented a mechanism of implementing an RBM model with CD as a learning algorithm for the neural networks with memristors as synaptic or weight edges. The technique presented in this paper was designed to mimic the basic CD cycle performed in software simulations of RBMs to keep things simple and extensible. Although the results achieved in this work do not attain the maximum performance which can be achieved using state of the art results, which are designed to use real numbers as
Acknowledgments
This work was partly supported by the ICT R&D program of MSIP/IITP (14-824-09-002, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis), and the Pioneer research center program through the National research foundation of Korea funded by the Ministry of Science, ICT and future planning (Grant number 2012-0009462).
References (24)
- et al.
Experimental demonstration of associative memory with memristive neural networks
Neural Netw.: The Official Journal of the International Neural Network Society
(2010) - Bengio, Y, 2012. Practical Recommendations for Gradient-based Training of Deep Architectures, CoRR...
- et al.
Continuous-valued probabilistic behavior in a VLSI generative model
IEEE Trans. Neural Netw. (A publication of the IEEE Neural Networks Council)
(2006) - Fort, A., Cortigiani, F., Rocchi, S., Vignoli, V, 2003. Very high-speed true random noise generator. Analog Integr....
- Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward neural networks. In: AISTATS,...
Training products of experts by minimizing contrastive divergence
Neural Comput.
(2002)- et al.
An integrated analog/digital random noise source
IEEE Trans. Circuits Syst. I: Fundam. Theory Appl
(1997) - Hopfield, J.J., 1982. Neural networks and physical systems with emergent collective computational abilities. Proc....
- Indiveri, G., Linares-Barranco, B., Legenstein, R., Deligeorgis, G., Prodromakis, T, 2013. Integration of nanoscale...
- Jo, M., Seong, D., Kim, S., Lee, J., Lee, W., Park, J.B., Park, S., Jung, S., Shin, J., Lee, D., 2010. Novel...