Abstract
Machine Learning algorithms (and in particular Reinforcement Learning (RL)) have proved very successful in recent years. These have managed to achieve super-human performance in many different tasks, from video-games to board-games and complex cognitive tasks such as path-planning or Theory of Mind (ToM) on artificial agents. Nonetheless, this super-human performance is also super-artificial. Despite some metrics are better than what a human can achieve (i.e. cumulative reward), in less common metrics (i.e. time to learning asymptote) the performance is significantly worse. Moreover, the means by which those are achieved fail to extend our understanding of the human or mammal brain. Moreover, most approaches used are based on black-box optimization, making any comparison beyond performance (e.g. at the architectural level) difficult. In this position paper, we review the origins of reinforcement learning and propose its extension with models of learning derived from fear and avoidance behaviors. We argue that avoidance-based mechanisms are required when training on embodied, situated systems to ensure fast and safe convergence and potentially overcome some of the current limitations of the RL paradigm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Kulkarni, T.D., et al.: Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: NIPS (2016)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Puigbò, J.-Y., et al.: Cholinergic behavior state-dependent mechanisms of neocortical gain control: a neurocomputational study. Mol. Neuro. 55(1), 249–257 (2018)
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Rescorla, R.A., Wagner, A.R.: A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. Curr. Res. Theory 2, 64–99 (1972)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Bousmalis, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857 (2017)
Legenstein, R., Pecevski, D., Maass, W.: A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput. Biol. 4(10), e1000180 (2008)
Maffei, G., et al.: The perceptual shaping of anticipatory actions. Proc. R. Soc. B 284(1869), 20171780 (2017)
Acknowledgements
This work is supported by the European Research Councils CDAC project: The Role of Consciousness in Adaptive Behavior: A Combined Empirical, Computational and Robot based Approach, (ERC-2013- ADG341196).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Puigbò, JY., Arsiwalla, X.D., Verschure, P.F.M.J. (2018). Challenges of Machine Learning for Living Machines. In: Vouloutsi , V., et al. Biomimetic and Biohybrid Systems. Living Machines 2018. Lecture Notes in Computer Science(), vol 10928. Springer, Cham. https://doi.org/10.1007/978-3-319-95972-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-95972-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95971-9
Online ISBN: 978-3-319-95972-6
eBook Packages: Computer ScienceComputer Science (R0)