Skip to main content
Log in

A stochastic policy search model for matching behavior

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

The matching law is one of the basic empirical laws in decision theory, and it states that a subject’s preference to optional targets depends on which choices are reinforced. In this paper, we study the possible mechanisms that explain why subjects’ decisions often obey this law. On the basis of reinforcement learning theory, we put forward a decision-making model in which the policy is updated by a policy parameter, and the model might be implemented in the brain through the prefrontal cortex and the basal ganglia neural circuit. Based on this model, an algorithm that satisfies the matching law is derived under some simple assumptions. Theoretical analysis and simulation results show that the decision behavior achieved by the algorithm obeys the matching law. In addition, the matching behaviors in two classical experiments are reproduced using the algorithm. Our results provide a reasonable strategy for the matching law and a useful computational tool for rewarded decision-making tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Schultz W. Behavioral theories and the neurophysiology of reward. Annu Rev Psychol, 2006, 57: 87–115

    Article  Google Scholar 

  2. Behrens T E J, Woolrich M W, Walton M E, et al. Learning the value of information in an uncertain world. Nat Neurosci, 2007, 10: 1214–1221

    Article  Google Scholar 

  3. Sugrue L, Corrado G, Newsome W. Matching behavior and the representation of value in the parietal cortex. Science, 2004, 304: 1782–1787

    Article  Google Scholar 

  4. Herrnstein R J. Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav, 1961, 4: 267–272

    Article  Google Scholar 

  5. Gallistel C R, Mark T A, King A P, et al. The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. J Exp Psychol Anim Behav Process, 2001, 27: 354–372

    Article  Google Scholar 

  6. Lau B, Glimcher P. Dynamic response-by-response models of matching behavior in rhesus monkeys. J Exp Anal Behav, 2005, 84: 555–579

    Article  Google Scholar 

  7. Bradshaw C M, Szabadi E, Bevan P. Behavior of humans in variable-interval schedules of reinforcement. J Exp Anal Behav, 1976, 26: 135–141

    Article  Google Scholar 

  8. Corrado G, Sugrue L, Seung H, et al. Linear-nonlinear-poisson models of primate choice dynamics. J Exp Anal Behav, 2005, 84: 581–617

    Article  Google Scholar 

  9. Vaughan W. Melioration, matching, and maximization. J Exp Anal Behav, 1981, 36: 141–149

    Article  Google Scholar 

  10. Herrnstein R J, Prelec D. Melioration: A theory of distributed choice. J Econ Perspect, 1991, 5: 137–156

    Google Scholar 

  11. Hinson J M, Staddon J E R. Matching, maximizing, and hill-climbing. J Exp Anal Behav, 1983, 40: 321–331

    Article  Google Scholar 

  12. Sakai Y, Fukai T. The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural Comput, 2008, 20: 227–251

    Article  MATH  MathSciNet  Google Scholar 

  13. Sutton R, Barto A. Reinforcement Learning: An Introduction. Cambridge: The MIT Press, 1998

    Google Scholar 

  14. Dayan P, Niv Y. Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol, 2008, 18: 185–196

    Article  Google Scholar 

  15. Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229–256

    MATH  Google Scholar 

  16. Baxter J, Bartlett P L. Infinite-horizon policy-gradient estimation. J Artif Intell Res, 2001, 15: 319–350

    MATH  MathSciNet  Google Scholar 

  17. Luce R. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley, 1959

    MATH  Google Scholar 

  18. Sakai Y, Fukai T. When does reward maximization lead to matching law? PLoS One, 2008, 3: e3795

    Article  Google Scholar 

  19. Simon H. The Foundation Stone for Modern Decision-Making (in Chinese). Beijing: Beijing Economic College Press, 1989

    Google Scholar 

  20. Lau B, Glimcher P W. Value representations in the primate striatum during matching behavior. Neuron, 2008, 58: 451–463

    Article  Google Scholar 

  21. Soltani A, Wang X J. A biophysically based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci, 2006, 26: 3731–3744

    Article  Google Scholar 

  22. Loewenstein Y, Seung H S. Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proc Natl Acad Sci USA, 2006, 103: 15224–15229

    Article  Google Scholar 

  23. Miller E K, Li L, Desimone R. A neural mechanism for working and recognition memory in inferior temporal cortex. Science, 1991, 254: 1377–1379

    Article  Google Scholar 

  24. Miller E K, Erickson C A, Desimone R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J Neurosci, 1996, 16: 5154–5167

    Google Scholar 

  25. Graybiel A M. The basal ganglia: learning new tricks and loving it. Curr Opin Neurobiol, 2005, 15: 638–644

    Article  Google Scholar 

  26. Amalric M, Koob G F. Functionally selective neurochemical afferents and efferents of the mesocorticolimbic and nigrostriatal dopamine system. Prog Brain Res, 1993, 99: 209–226

    Article  Google Scholar 

  27. Voorn P, Vanderschuren L J M J, Groenewegen H J, et al. Putting a spin on the dorsal-ventral divide of the striatum. Trends Neurosci, 2004, 27: 468–474

    Article  Google Scholar 

  28. Joel D, Weiner I. The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 2000, 96: 451–474

    Article  Google Scholar 

  29. Samejima K, Ueda Y, Doya K, et al. Representation of action-specific reward values in the striatum. Science, 2005, 310: 1337–1340

    Article  Google Scholar 

  30. Cohen M X, Frank M J. Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res, 2009, 199: 141–156

    Article  Google Scholar 

  31. O’Doherty J, Dayan P, Schultz J, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 2004, 304: 452–454

    Article  Google Scholar 

  32. Montague P R, Hyman S E, Cohen J D. Computational roles for dopamine in behavioural control. Nature, 2004, 431: 760–767

    Article  Google Scholar 

  33. Schultz W, Dayan P, Montague P R. A neural substrate of prediction and reward. Science, 1997, 275: 1593–1599

    Article  Google Scholar 

  34. Pessiglione M, Seymour B, Flandin G, et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 2006, 442: 1042–1045

    Article  Google Scholar 

  35. Cohen J D, Braver T S, Brown J W. Computational perspectives on dopamine function in prefrontal cortex. Curr Opin Neurobiol, 2002, 12: 223–229

    Article  Google Scholar 

  36. O’Reilly R C, Noelle D C, Braver T S, et al. Prefrontal cortex and dynamic categorization tasks: representational organization and neuromodulatory control. Cereb Cortex, 2002, 12: 246–257

    Article  Google Scholar 

  37. O’Reilly R C, Frank M J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput, 2006, 18: 283–328

    Article  MATH  MathSciNet  Google Scholar 

  38. Gruber A J, Dayan P, Gutkin B S, et al. Dopamine modulation in the basal ganglia locks the gate to working memory. J Comput Neurosci, 2006, 20: 153–166

    Article  MATH  MathSciNet  Google Scholar 

  39. Cheng Z B, Deng Z D, Yang B. Computational model for simple Bayesian decision (in Chinese). Sci China Ser C-Life Sci, 2009, 39: 783–779

    Google Scholar 

  40. Cheng J Q, Li Y H, Sui N. Decision making and its underlying brain mechanism based on rodent research (in Chinese). Adv Psycholog Sci, 2008, 16: 721–725

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZhiDong Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, Z., Zhang, Y. & Deng, Z. A stochastic policy search model for matching behavior. Sci. China Inf. Sci. 54, 1430–1443 (2011). https://doi.org/10.1007/s11432-011-4304-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4304-x

Keywords

Navigation