Loading [a11y]/accessibility-menu.js
Hardware-Friendly Actor-Critic Reinforcement Learning Through Modulation of Spike-Timing-Dependent Plasticity | IEEE Journals & Magazine | IEEE Xplore

Hardware-Friendly Actor-Critic Reinforcement Learning Through Modulation of Spike-Timing-Dependent Plasticity


Abstract:

In this work, we propose a hardware-friendly reinforcement learning algorithm. The learning algorithm is based on an actor-critic structure implemented with spiking neura...Show More

Abstract:

In this work, we propose a hardware-friendly reinforcement learning algorithm. The learning algorithm is based on an actor-critic structure implemented with spiking neural networks (SNNs). A biologically plausible and hardware-friendly spike-timing-dependent plasticity learning rule is formulated and employed in the training of SNNs. Several important aspects of applying the learning rule in a reinforcement learning context is studied, especially from the circuit designers' point of view. Pitfalls of potential noise mixing and correlated spikes are identified and properly addressed. To feature a low-power learning architecture, techniques such as down-sampling data for certain learning blocks, injecting quantization noise as noisy residues in neurons, and proper memory partitioning are proposed. A 1-D state-value function learning problem and a 2-D maze walking problem are examined in this paper to illustrate effectiveness of the proposed algorithm and learning rules. A low-power hardware architecture is proposed and examples are implemented with Verilog. Hardware complexity of the proposed algorithm is analyzed, and potential solutions to breaking memory bottleneck when the size of the problem gets large is also discussed.
Published in: IEEE Transactions on Computers ( Volume: 66, Issue: 2, 01 February 2017)
Page(s): 299 - 311
Date of Publication: 27 July 2016

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.