Abstract
The matching law is one of the basic empirical laws in decision theory, and it states that a subject’s preference to optional targets depends on which choices are reinforced. In this paper, we study the possible mechanisms that explain why subjects’ decisions often obey this law. On the basis of reinforcement learning theory, we put forward a decision-making model in which the policy is updated by a policy parameter, and the model might be implemented in the brain through the prefrontal cortex and the basal ganglia neural circuit. Based on this model, an algorithm that satisfies the matching law is derived under some simple assumptions. Theoretical analysis and simulation results show that the decision behavior achieved by the algorithm obeys the matching law. In addition, the matching behaviors in two classical experiments are reproduced using the algorithm. Our results provide a reasonable strategy for the matching law and a useful computational tool for rewarded decision-making tasks.
Similar content being viewed by others
References
Schultz W. Behavioral theories and the neurophysiology of reward. Annu Rev Psychol, 2006, 57: 87–115
Behrens T E J, Woolrich M W, Walton M E, et al. Learning the value of information in an uncertain world. Nat Neurosci, 2007, 10: 1214–1221
Sugrue L, Corrado G, Newsome W. Matching behavior and the representation of value in the parietal cortex. Science, 2004, 304: 1782–1787
Herrnstein R J. Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav, 1961, 4: 267–272
Gallistel C R, Mark T A, King A P, et al. The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. J Exp Psychol Anim Behav Process, 2001, 27: 354–372
Lau B, Glimcher P. Dynamic response-by-response models of matching behavior in rhesus monkeys. J Exp Anal Behav, 2005, 84: 555–579
Bradshaw C M, Szabadi E, Bevan P. Behavior of humans in variable-interval schedules of reinforcement. J Exp Anal Behav, 1976, 26: 135–141
Corrado G, Sugrue L, Seung H, et al. Linear-nonlinear-poisson models of primate choice dynamics. J Exp Anal Behav, 2005, 84: 581–617
Vaughan W. Melioration, matching, and maximization. J Exp Anal Behav, 1981, 36: 141–149
Herrnstein R J, Prelec D. Melioration: A theory of distributed choice. J Econ Perspect, 1991, 5: 137–156
Hinson J M, Staddon J E R. Matching, maximizing, and hill-climbing. J Exp Anal Behav, 1983, 40: 321–331
Sakai Y, Fukai T. The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural Comput, 2008, 20: 227–251
Sutton R, Barto A. Reinforcement Learning: An Introduction. Cambridge: The MIT Press, 1998
Dayan P, Niv Y. Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol, 2008, 18: 185–196
Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229–256
Baxter J, Bartlett P L. Infinite-horizon policy-gradient estimation. J Artif Intell Res, 2001, 15: 319–350
Luce R. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley, 1959
Sakai Y, Fukai T. When does reward maximization lead to matching law? PLoS One, 2008, 3: e3795
Simon H. The Foundation Stone for Modern Decision-Making (in Chinese). Beijing: Beijing Economic College Press, 1989
Lau B, Glimcher P W. Value representations in the primate striatum during matching behavior. Neuron, 2008, 58: 451–463
Soltani A, Wang X J. A biophysically based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci, 2006, 26: 3731–3744
Loewenstein Y, Seung H S. Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proc Natl Acad Sci USA, 2006, 103: 15224–15229
Miller E K, Li L, Desimone R. A neural mechanism for working and recognition memory in inferior temporal cortex. Science, 1991, 254: 1377–1379
Miller E K, Erickson C A, Desimone R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J Neurosci, 1996, 16: 5154–5167
Graybiel A M. The basal ganglia: learning new tricks and loving it. Curr Opin Neurobiol, 2005, 15: 638–644
Amalric M, Koob G F. Functionally selective neurochemical afferents and efferents of the mesocorticolimbic and nigrostriatal dopamine system. Prog Brain Res, 1993, 99: 209–226
Voorn P, Vanderschuren L J M J, Groenewegen H J, et al. Putting a spin on the dorsal-ventral divide of the striatum. Trends Neurosci, 2004, 27: 468–474
Joel D, Weiner I. The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 2000, 96: 451–474
Samejima K, Ueda Y, Doya K, et al. Representation of action-specific reward values in the striatum. Science, 2005, 310: 1337–1340
Cohen M X, Frank M J. Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res, 2009, 199: 141–156
O’Doherty J, Dayan P, Schultz J, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 2004, 304: 452–454
Montague P R, Hyman S E, Cohen J D. Computational roles for dopamine in behavioural control. Nature, 2004, 431: 760–767
Schultz W, Dayan P, Montague P R. A neural substrate of prediction and reward. Science, 1997, 275: 1593–1599
Pessiglione M, Seymour B, Flandin G, et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 2006, 442: 1042–1045
Cohen J D, Braver T S, Brown J W. Computational perspectives on dopamine function in prefrontal cortex. Curr Opin Neurobiol, 2002, 12: 223–229
O’Reilly R C, Noelle D C, Braver T S, et al. Prefrontal cortex and dynamic categorization tasks: representational organization and neuromodulatory control. Cereb Cortex, 2002, 12: 246–257
O’Reilly R C, Frank M J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput, 2006, 18: 283–328
Gruber A J, Dayan P, Gutkin B S, et al. Dopamine modulation in the basal ganglia locks the gate to working memory. J Comput Neurosci, 2006, 20: 153–166
Cheng Z B, Deng Z D, Yang B. Computational model for simple Bayesian decision (in Chinese). Sci China Ser C-Life Sci, 2009, 39: 783–779
Cheng J Q, Li Y H, Sui N. Decision making and its underlying brain mechanism based on rodent research (in Chinese). Adv Psycholog Sci, 2008, 16: 721–725
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, Z., Zhang, Y. & Deng, Z. A stochastic policy search model for matching behavior. Sci. China Inf. Sci. 54, 1430–1443 (2011). https://doi.org/10.1007/s11432-011-4304-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4304-x