Skip to main content

Advertisement

Log in

Alternative time representation in dopamine models

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

Dopaminergic neuron activity has been modeled during learning and appetitive behavior, most commonly using the temporal-difference (TD) algorithm. However, a proper representation of elapsed time and of the exact task is usually required for the model to work. Most models use timing elements such as delay-line representations of time that are not biologically realistic for intervals in the range of seconds. The interval-timing literature provides several alternatives. One of them is that timing could emerge from general network dynamics, instead of coming from a dedicated circuit. Here, we present a general rate-based learning model based on long short-term memory (LSTM) networks that learns a time representation when needed. Using a naïve network learning its environment in conjunction with TD, we reproduce dopamine activity in appetitive trace conditioning with a constant CS-US interval, including probe trials with unexpected delays. The proposed model learns a representation of the environment dynamics in an adaptive biologically plausible framework, without recourse to delay lines or other special-purpose circuits. Instead, the model predicts that the task-dependent representation of time is learned by experience, is encoded in ramp-like changes in single-neuron activity distributed across small neural networks, and reflects a temporal integration mechanism resulting from the inherent dynamics of recurrent loops within the network. The model also reproduces the known finding that trace conditioning is more difficult than delay conditioning and that the learned representation of the task can be highly dependent on the types of trials experienced during training. Finally, it suggests that the phasic dopaminergic signal could facilitate learning in the cortex.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bakker, B. (2002). Reinforcement learning with long short-term memory. In T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.), Neural information processing systems (pp. 1475–1482). Cambridge: MIT.

    Google Scholar 

  • Balsam, P. D., Drew, M. R., & Yang, C. (2002). Timing at the start of associative learning. Learning and Motivation, 33, 141–155.

    Article  Google Scholar 

  • Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). The MIT.

  • Bertin, M., Schweighofer, N., & Doya, K. (2007). Multiple model-based reinforcement learning explains dopamine neuronal activity. Neural Networks, 20, 668–675.

    Article  PubMed  Google Scholar 

  • Beylin, A. V., Gandhi, C. C., Wood, G. E., Talk, A. C., Matzel, L. D., & Shors, T. J. (2001). The role of the hippocampus in trace conditioning: temporal discontinuity or task difficulty? Neurobiology of Learning and Memory, 76, 447–461.

    Article  CAS  PubMed  Google Scholar 

  • Braver, T. S., & Cohen, J. D. (2000). On the control of control: The role of dopamine in regulating prefrontal function and working memory. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 713–737). Cambridge: MIT.

    Google Scholar 

  • Brody, C. D., Hernandez, A., Zainos, A., & Romo, R. (2003). Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cerebral Cortex, 13, 1196–1207.

    Article  PubMed  Google Scholar 

  • Brown, J., Bullock, D., & Grossberg, S. (1999). How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. Journal of Neuroscience, 19, 10502–10511.

    CAS  PubMed  Google Scholar 

  • Buhusi, C. V., & Meck, W. H. (2000). Timing for the absence of a stimulus: the gap paradigm reversed. Journal of Experimental Psychology: Animal Behavior Processes, 26, 305–322.

    Article  CAS  PubMed  Google Scholar 

  • Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews. Neuroscience, 6, 755–765.

    Article  CAS  PubMed  Google Scholar 

  • Church, R. M. (2003). A concise introduction to scalar timing theory. In W. H. Meck (Ed.), Functional and neural mechanisms of interval timing (pp. 3–22). Boca Raton: CRC.

    Google Scholar 

  • Clark, R. E., & Squire, L. R. (1998). Classical conditioning and brain systems: the role of awareness. Science, 280, 77–81.

    Article  CAS  PubMed  Google Scholar 

  • Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Curr. Opin. Neurobiol.

  • Daw, N. D., Courville, A. C., & Touretzky, D. S. (2003). Timing and partial observability in the dopamine system. In S. Becker, S. Thrun & K. Obermayer (Eds.), Neural information processing systems (pp. 83–90). Cambridge: MIT.

    Google Scholar 

  • Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.

    Article  PubMed  Google Scholar 

  • Dormont, J. F., Conde, H., & Farin, D. (1998). The role of the pedunculopontine tegmental nucleus in relation to conditioned motor performance in the cat. I. Context-dependent and reinforcement-related single unit activity. Experimental Brain Research, 121, 401–410.

    Article  CAS  Google Scholar 

  • Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks, 12, 961–974.

    Article  PubMed  Google Scholar 

  • Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732–739.

    Article  CAS  PubMed  Google Scholar 

  • Dragoi, V., Staddon, J. E., Palmer, R. G., & Buhusi, C. V. (2003). Interval timing as an emergent learning property. Psychological Review, 110, 126–144.

    Article  PubMed  Google Scholar 

  • Durstewitz, D. (2004). Neural representation of interval time. NeuroReport, 15, 745–749.

    Article  PubMed  Google Scholar 

  • Eck, D., & Schmidhuber, A. (2002). Learning the long-term structure of the blues. Artificial Neural Networks—Icann, 2002(2415), 284–289.

    Google Scholar 

  • Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898–1902.

    Article  CAS  PubMed  Google Scholar 

  • Fiorillo, C. D., Newsome, W. T., & Schultz, W. (2008). The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience, 11, 966–973.

    Article  CAS  Google Scholar 

  • Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19, 1468–1502.

    Article  PubMed  Google Scholar 

  • Fukuda, M., Ono, T., Nishino, H., & Nakamura, K. (1986). Neuronal responses in monkey lateral hypothalamus during operant feeding behavior. Brain Research Bulletin, 17, 879–883.

    Article  CAS  PubMed  Google Scholar 

  • Fukuda, M., Ono, T., Nakamura, K., & Tamura, R. (1990). Dopamine and ACh involvement in plastic learning by hypothalamic neurons in rats. Brain Research Bulletin, 25, 109–114.

    Article  CAS  PubMed  Google Scholar 

  • Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1989). Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 61, 331–349.

    CAS  PubMed  Google Scholar 

  • Gallistel, C. R., & Gibbon, J. (2000). Time, rate, and conditioning. Psychological Review, 107, 289–344.

    Article  CAS  PubMed  Google Scholar 

  • Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: continual prediction with LSTM. Neural Computation, 12, 2451–2471.

    Article  CAS  PubMed  Google Scholar 

  • Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3, 115–143.

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.

    Article  CAS  PubMed  Google Scholar 

  • Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304–309.

    Article  CAS  PubMed  Google Scholar 

  • Hopson, J. W. (2003). General learning models: Timing without a clock. In W. H. Meck (Ed.), Functional and neural mechanisms of interval timing (pp. 23–60). Boca Raton: CRC.

    Google Scholar 

  • Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 249–270). The MIT.

  • Ivry, R. B., & Schlerf, J. E. (2008). Dedicated and intrinsic models of time perception. Trends in Cognitive Sciences, 12, 273–280.

    Article  PubMed  Google Scholar 

  • Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17, 2443–2452.

    Article  PubMed  Google Scholar 

  • Joel, D., & Weiner, I. (2000). The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 96, 451–474.

    Article  CAS  PubMed  Google Scholar 

  • Karmarkar, U. R., & Buonomano, D. V. (2007). Timing in the absence of clocks: encoding time in neural network states. Neuron, 53, 427–438.

    Article  CAS  PubMed  Google Scholar 

  • Kolodziejski, C., Porr, B., & Worgotter, F. (2009). On the asymptotic equivalence between differential hebbian and temporal difference learning. Neural Computation, 21, 1173–1202.

    Article  PubMed  Google Scholar 

  • Komura, Y., Tamura, R., Uwano, T., Nishijo, H., Kaga, K., & Ono, T. (2001). Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature, 412, 546–549.

    Article  CAS  PubMed  Google Scholar 

  • Laubach, M. (2005). Who’s on first? What’s on second? The time course of learning in corticostriatal systems. Trends in Neurosciences, 28, 508–511.

    Article  Google Scholar 

  • Lebedev, M. A., O’Doherty, J. E., & Nicolelis, M. A. (2008). Decoding of temporal intervals from cortical ensemble activity. Journal of Neurophysiology, 99, 166–186.

    Article  PubMed  Google Scholar 

  • Leon, M. I., & Shadlen, M. N. (2003). Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron, 38, 317–327.

    Article  CAS  PubMed  Google Scholar 

  • Lewis, P. A. (2002). Finding the timer. Trends in Cognitive Sciences, 6, 195–196.

    Article  PubMed  Google Scholar 

  • Ljungberg, T., Apicella, P., & Schultz, W. (1992). Responses of monkey dopamine neurons during learning of behavioral reactions. Journal of Neurophysiology, 67, 145–163.

    CAS  PubMed  Google Scholar 

  • Lucchetti, C., & Bon, L. (2001). Time-modulated neuronal activity in the premotor cortex of macaque monkeys. Experimental Brain Research, 141, 254–260.

    Article  CAS  Google Scholar 

  • Lucchetti, C., Ulrici, A., & Bon, L. (2005). Dorsal premotor areas of nonhuman primate: functional flexibility in time domain. European Journal of Applied Physiology, 95, 121–130.

    Article  PubMed  Google Scholar 

  • Ludvig, E. A., Sutton, R. S., & Kehoe, E. J. (2008). Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Computation, 20, 3034–3054.

    Article  PubMed  Google Scholar 

  • Ludvig, E. A., Sutton, R. S., Verbeek, E., & Kehoe, E. J. (2009). A computational model of hippocampal function in trace conditioning. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Neural information processing systems (pp. 993–1000).

  • Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72, 1024–1027.

    CAS  PubMed  Google Scholar 

  • Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16, 1936–1947.

    CAS  PubMed  Google Scholar 

  • Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431, 760–767.

    Article  CAS  PubMed  Google Scholar 

  • Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133–143.

    Article  CAS  PubMed  Google Scholar 

  • Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9, 1057–1063.

    Article  CAS  PubMed  Google Scholar 

  • O’Reilly, R. C., & Frank, M. J. (2006). Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18, 283–328.

    Article  PubMed  Google Scholar 

  • Otani, S., Daniel, H., Roisin, M. P., & Crepel, F. (2003). Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. Cerebral Cortex, 13, 1251–1256.

    Article  PubMed  Google Scholar 

  • Pan, W. X., Schmidt, R., Wickens, J. R., & Hyland, B. I. (2005). Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. Journal of Neuroscience, 25, 6235–6242.

    Article  CAS  PubMed  Google Scholar 

  • Potjans, W., Morrison, A., & Diesmann, M. (2009). A spiking neural network model of an actor-critic learning agent. Neural Computation, 21, 301–339.

    Article  PubMed  Google Scholar 

  • Reynolds, J. N., & Wickens, J. R. (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15, 507–521.

    Article  PubMed  Google Scholar 

  • Rivest, F., Bengio, Y., & Kalaska, J. F. (2005). Brain inspired reinforcement learning. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Neural information processing systems (pp. 1129–1136). Cambridge: The MIT.

    Google Scholar 

  • Roberts, P. D., Santiago, R. A., & Lafferriere, G. (2008). An implementation of reinforcement learning based on spike timing dependent plasticity. Biological Cybernetics, 99, 517–523.

    Article  PubMed  Google Scholar 

  • Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10, 1615–1624.

    Article  CAS  PubMed  Google Scholar 

  • Romo, R., Brody, C. D., Hernandez, A., & Lemus, L. (1999). Neuronal correlates of parametric working memory in the prefrontal cortex. Nature, 399, 470–473.

    Article  CAS  PubMed  Google Scholar 

  • Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D., & O’Reilly, R. C. (2005). Prefrontal cortex and flexible cognitive control: rules without symbols. Proceedings of the National Academy of Sciences USA, 102, 7338–7343.

    Article  CAS  Google Scholar 

  • Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310, 1337–1340.

    Article  CAS  PubMed  Google Scholar 

  • Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. Journal of Neuroscience, 12, 4595–4610.

    CAS  PubMed  Google Scholar 

  • Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13, 900–913.

    CAS  PubMed  Google Scholar 

  • Schultz, W., Apicella, P., Romo, R., & Scarnati, E. (1995). Context-dependent activity in primate striatum reflecting past and future behavioral events. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 11–27). The MIT.

  • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.

    Article  CAS  PubMed  Google Scholar 

  • Suri, R. E., & Schultz, W. (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Experimental Brain Research, 121, 350–354.

    Article  CAS  Google Scholar 

  • Suri, R. E., & Schultz, W. (1999). A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience, 91, 871–890.

    Article  CAS  PubMed  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT.

    Google Scholar 

  • Thibaudeau, G., Potvin, O., Allen, K., Dore, F. Y., & Goulet, S. (2007). Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning. Behavioural Brain Research, 185, 9–20.

    Article  PubMed  Google Scholar 

  • Todd, M. T., Niv, Y., & Cohen, J. D. (2009). Learning to use working memory in partially observable environments through dopaminergic reinforcement. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou (Eds.), Neural information processing systems (pp. 1689–1696). Cambridge: The MIT.

    Google Scholar 

  • Wickens, J., & Kotter, R. (1995). Cellular models of reinforcement. In J. C. Houk, J. L. Davis & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 187–214). Cambridge: The MIT.

    Google Scholar 

Download references

Acknowledgements

We are grateful to Douglas Eck, Aaron Courville, Doina Precup, and many others for discussion in the development of the present work. This manuscript also profited from the comments of Pascal Fortier-Poisson and Elliot Ludvig, as well as from the anonymous reviewers. F.R. was supported by doctoral studentships from the New Emerging Team Grant in Computational Neuroscience (CIHR) and from the Groupe de recherche sur le système nerveux central (FRSQ). Y.B and J.K. were supported by the CIHR New Emerging Team Grant in Computational Neuroscience and an infrastructure grant from the FRSQ.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Rivest.

Additional information

Action Editor: P. Dayan

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplemental Pseudocode

(DOC 71 kb)

Supplemental Tables and Figures

(DOC 820 kb)

Appendices

Appendix A

1.1 Supplemental material 1: Java simulator

This program is a Java package containing the original simulation software used to perform the present simulations and can be downloaded from ModelDB (http://senselab.med.yale.edu/modeldb/) at http://senselab.med.yale.edu/ModelDB/ShowModel.asp?model=124329. No publication or commercial derivatives can be made without the author’s written consent. On the other hand, researchers who would like to have more options to try to make predictions and plan animal experiments are welcome to contact the author [FR]; model options, environmental or tasks options, as well as options to include other models can be discussed.

1.2 Supplemental material 2: Simulations data

Three MS Access .mdb files containing all recorded data for the present experiment can be downloaded from Université de Montréal institutional digital repository Papyrus (https://papyrus.bib.umontreal.ca/jspui/) at http://hdl.handle.net/1866/3073. We highly recommend contacting the author [FR] before making any inferences about the data.

The Master database contains the last 20 training blocks with the test blocks before, in between, and after them (there was one test block every 10 training blocks). These are then followed by an extra training block and a final control block. The PreTraining database contains 2 control blocks run on 30 untrained networks. The InitialTraining database contains 2 training blocks run on 30 untrained networks.

Appendix B

1.1 Theoretical p signal

A theoretical p signal can be computed to evaluate the internal model the networks have learned. We make few assumptions about the networks’ internal model of the task, compute a theoretical p signal for that model using TD, and compare it to the networks’ p signal.

We will assume that the networks are unable to keep track of time during the intertrial interval between the arrival of a US and the presentation of the next CS and that they have perfect knowledge of the task within a trial. We modeled this using a 7-state Markov model: the model begins in intertrial state; when the CS appears, it shift into a CS state followed by 5 other states representing 200 ms, 400 ms, 600 ms, 800 ms and 100 ms after CS onset; then the model automatically comes back in the intertrial state again. The p neurons learn an estimate of the sum of discounted future rewards, i.e.

$$ p \approx \sum {_{k = 0,\infty }{\gamma^k}{r_{t + k + 1}},} $$

γ = 0.98. To compute the theoretical p signal implied by the Markov model, we simply reran the simulations (without test blocks) using the same TD component as in the paper, but replacing its input by the 7-states model described above. We also decreased the TD learning rate on each training block (1% smaller) to ensure convergence.

This theoretical p signal converged to an expected value of 1.4 during intertrial intervals, increased from 2.0 to 2.1 from the CS onset to usual US arrival (Fig. 5, bottom row, middle panel, dashed line), and goes down to 1.2 on the US (1,000 ms) state. This means that in general, the networks could expect their sum of future rewards in a block, without more information, to be about 1.4. Once a trial has started, signalled by the arrival of CS, a more precise value that includes the imminent reward of value 1 can be computed, thus the sudden rise in the p estimate. The p value slowly increases as the expected reward gets closer in time due to discounting. Finally on US presentation, networks can expect less than in intertrial state since the US is never closely followed by a reward. Intertrial state estimate is less precise since all intertrial steps share the same estimate. In TD, the error is always pushed backward in time, since the only unpredictable even is the arrival of the CS, this is where most of the error signal (δ) appears, i.e. at CS presentation.

As shown on Fig. 5, the theoretical p signal and the averaged networks’ p signal are relatively close to each other.

Appendix C

1.1 Memory block responses

Some of the memory blocks in the 30 successful networks contained elements with unphysiological response patterns. In particular, the activity of some memory cells increased throughout the duration of entire training blocks, mainly because of sustained ramp increases during the intertrial intervals (Supplemental Figure 5). This increase probably would have continued indefinitely if the neurons’ activity was not reset between each training block.

A deeper analysis revealed that memory blocks with such memory cell behaviours did not contribute directly to the network output at time t, even though they had some phasic responses to the task’s signals. Of the 60 memory blocks (2 per network), 19 had one of the above patterns. No successful network had two memory blocks with such signals, indicating that it is probably impossible to learn the task with such signals only. Using a correlation measure between the memory block outputs and the LSTM outputs to evaluate their contribution to network function, 17 of the 19 memory blocks with unusual responses were removed from the LSTM representation analysis (memory blocks with absolute correlations <0.03 where removed from the analysis), leaving a total of 43 memory blocks (out of 60) analyzed. None of the 41 memory blocks considered normal were rejected by the correlation test. The finding that a number of networks learned the task even though one of their memory blocks had unphysiological response patterns and did not contribute significantly to network output indicates that one memory block is probably sufficient to learn this classical conditioning task.

In an extra experiment post-analysis, we ran a small number of networks using a single memory block and a single memory cell. Although the success rate of these networks was much lower (~6%), their final solution matched the one found in Section 3.5. This also suggests that blocks removed from the analysis were not contributing to the final solution and that the final solution probably stands within a single memory block most of the time.

Appendix D

1.1 Mesocortical performance

In order to verify that the mesocortical speed-up was not simply due to the addition of two constant learning rates, we trained 50 networks for each pair of hyper-parameters values in the range {0.0, 0.0032, 0.01, 0.0316, 0.1, 0.3162, 1.0, 3.1623}2 (powers of square root of 10). We than reduce the space of hyper-parameters by looking only at the number of successful networks (networks still successful on the last training block, Supplemental Figure 6). With β = 0, there was at best 34% of successful networks (α lstm  = 1). The averaged first successful epoch (using the 1,000 limit for the 13 that fails to have a successful block) was 480 blocks. Higher success rate with the mesocortical model (from 50% to 60%) were obtained within the range lstm , β) ε {{0.0316, 0.1, 0.3162} × {0.3162, 1.0}}. In this range, the averaged first successful block of the fastest networks (α lstm  = 0.3162 and β = 1.0) was 260 (only 6 networks failed to have a successful block and had a value of 1,000). Finally, we also looked whether networks could learn only using DA as learning rate (with no basic intrinsic learning rate, α lstm  = 0.0). β = 0.3162 and β = 1.0 gave the best success rates (30% and 34% respectively), the fastest networks (β = 1.0) had an averaged first successful block of 616 meaning that in this task, DA alone, without a basic intrinsic learning rate, seemed sufficient to learn. We performed a one-way ANOVA on these 3 sets of networks (α lstm  = 1.0 and β = 0.0, α lstm  = 0.3162 and β = 1.0,α lstm  = 0.0 and β = 1.0) and found a significant difference (P < 1.0E-6). A post-hoc Scheffe test showed that the mesocortical model using a basic learning was significantly faster than the two other groups.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivest, F., Kalaska, J.F. & Bengio, Y. Alternative time representation in dopamine models. J Comput Neurosci 28, 107–130 (2010). https://doi.org/10.1007/s10827-009-0191-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-009-0191-1

Keywords

Navigation