Skip to main content
Log in

Neural coding of categories: information efficiency and optimal population codes

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

This paper deals with the analytical study of coding a discrete set of categories by a large assembly of neurons. We consider population coding schemes, which can also be seen as instances of exemplar models proposed in the literature to account for phenomena in the psychophysics of categorization. We quantify the coding efficiency by the mutual information between the set of categories and the neural code, and we characterize the properties of the most efficient codes, considering different regimes corresponding essentially to different signal-to-noise ratio. One main outcome is to find that, in a high signal-to-noise ratio limit, the Fisher information at the population level should be the greatest between categories, which is achieved by having many cells with the stimulus-discriminating parts (steepest slope) of their tuning curves placed in the transition regions between categories in stimulus space. We show that these properties are in good agreement with both psychophysical data and with the neurophysiology of the inferotemporal cortex in the monkey, a cortex area known to be specifically involved in classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abbott, L., & Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11, 91–101.

    Article  PubMed  CAS  Google Scholar 

  • Abramson, A., & Lisker, L. (1970). Discriminability along the voicing continuum: Cross-language tests. In Proceedings of the sixth international congress of phonetic sciences. Prague: Academia.

    Google Scholar 

  • Ashby, F., & Spiering, B. (2004). The neurobiology of category learning. Behavioral and Cognitive Neuroscience Reviews, 3(2), 101–113.

    Article  PubMed  Google Scholar 

  • Averbeck, B., Latham, P., & Pouget, A. (2006). Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7, 358–366.

    Article  PubMed  CAS  Google Scholar 

  • Blahut, R. E. (1987). Principles and practice of information theory. Boston, MA: Addison-Wesley Longman.

    Google Scholar 

  • Brunel, N., & Nadal, J.-P. (1998). Mutual information, fisher information, and population coding. Neural Computation, 10, 1731–1757.

    Article  PubMed  CAS  Google Scholar 

  • Butts, D. A., & Goldman, M. S. (2006). Tuning curves, neuronal variability, and sensory coding. PLoS Biology, 4(4), e92.

    Article  CAS  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    Google Scholar 

  • Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). New York: Wiley.

    Google Scholar 

  • Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge: MIT Press.

    Google Scholar 

  • Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.

    Google Scholar 

  • Fisher, J., & Principe, J. (1998). A methodology for information theoretic feature extraction. In A. Stuberud (Ed.), Proceedings of the IEEE international joint conference on neural networks. Piscataway: IEEE.

    Google Scholar 

  • Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316.

    Article  PubMed  CAS  Google Scholar 

  • Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2003). A comparison of primate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 15, 5235–5246.

    Google Scholar 

  • Georgopoulos, A., Schwartz, A., & Kettner, R. (1986). Neuronal population coding of movement direction. Science, 233, 1416–1419.

    Article  PubMed  CAS  Google Scholar 

  • Goldstone, R. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123(2), 178–200.

    Article  CAS  Google Scholar 

  • Green, D., & Swets, J. (1988). Signal detection theory and psychophysics, reprint edition. Los Altos, CA: Peninsula.

    Google Scholar 

  • Guenther, F., Husain, F., Cohen, M., & Shinn-Cunningham, B. (1999). Effects of categorization and discrimination training on auditory perceptual space. Journal of the Acoustical Society of America, 106, 2900–2912.

    Article  PubMed  CAS  Google Scholar 

  • Han, Y., Köver, H., Insanally, M., Semerdjian, J., & Bao, S. (2007). Early experience impairs perceptual discrimination. Nature Neuroscience, 20(9), 1191–1197.

    Article  CAS  Google Scholar 

  • Harnad, S. (Ed.) (1987). Categorical perception: The groundwork of cognition. New York: Cambridge University Press.

    Google Scholar 

  • Harnad, S. (2005). Cognition is categorization. In H. Cohen & C. Lefebvre (Eds.), Handbook of categorization. Amsterdam: Elsevier.

    Google Scholar 

  • Hillenbrand, J., Getty, L., Clark, M., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(5), 3099–3111.

    Article  PubMed  CAS  Google Scholar 

  • Hintzman, D. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93(4), 411–428.

    Article  Google Scholar 

  • Humphreys, G., & Forde, E. (2001). Hierarchies, similarity and interactivity in object recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain Sciences, 24, 453–509.

    PubMed  CAS  Google Scholar 

  • Hung, C., Kreiman, G., Poggio, T., & DiCarlo, J. (2005). Fast readout of object identity from macaque inferior temporal cortex. Science, 310, 863–866.

    Article  PubMed  CAS  Google Scholar 

  • Jiang, X., Bradley, E., Rini, R., Zeffiro, T., VanMeter, J., & Riesenhuber, M. (2007). Categorization training results in shape- and category-selective human neural plasticity. Neuron, 53, 891–903.

    Article  PubMed  CAS  Google Scholar 

  • Kang, K., Shapley, R., & Sompolinsky, H. (2004). Information tuning of populations of neurons in primary visual cortex. Journal of Neuroscience, 24(13), 3726–3735.

    Article  PubMed  CAS  Google Scholar 

  • Kang, K., & Sompolinsky, H. (2001). Mutual information of population codes and distance measures in probability space. Physical Review Letters, 86(21), 4958–4961.

    Article  PubMed  CAS  Google Scholar 

  • Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97, 4296–4309.

    Article  PubMed  Google Scholar 

  • Knoblich, U., Freedman, D., & Riesenhuber, M. (2002). Categorization in it and pfc: Model and experiments. AI Memo 2002-007. Cambridge, MA: MIT AI Laboratory.

    Google Scholar 

  • Kobatake, E., Wang, G., & Tanaka, K. (1998). Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. Journal of Neurophysiology, 80, 324–330.

    PubMed  CAS  Google Scholar 

  • Koida, K., & Komatsu, H. (2007). Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex. Nature Neuroscience, 10(1), 108–116.

    Article  PubMed  CAS  Google Scholar 

  • Kruschke, J. (1992). Alcove: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44.

    Article  PubMed  CAS  Google Scholar 

  • Kuhl, P. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception & Psychophysics, 50(2), 93–107.

    CAS  Google Scholar 

  • Kuhl, P., & Padden, D. (1983). Enhanced discriminability at the phonetic boundaries for the place feature in macaques. Journal of the Acoustical Society of America, 73(3), 1003–1010.

    Article  PubMed  CAS  Google Scholar 

  • Li, W., Piech, V., & Gilbert, C. (2004). Perceptual learning and top-down influences in primary visual cortex. Nature Neuroscience, 7(6), 651–658.

    Article  PubMed  CAS  Google Scholar 

  • Liberman, A., Harris, K., Hoffman, H., & Griffith, B. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54, 358–369.

    Article  PubMed  CAS  Google Scholar 

  • Livingston, K., Andrews, J., & Harnad, S. (1998). Categorical perception effects induced by category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 24(3), 732–753.

    Article  CAS  Google Scholar 

  • Logothetis, N., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5(5), 552–563.

    Article  PubMed  CAS  Google Scholar 

  • Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.

    Google Scholar 

  • Nadal, J.-P. (1994). Formal neural networks: From supervised to unsupervised learning. In E. Goles & S. Martinez (Eds.), Cellular automata, dynamical systems and neural networks. Mathematics and its applications (Vol. 282, pp. 147–166). Norwell: Kluwer.

    Google Scholar 

  • Nosofsky, R. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology, 115(1), 39–57.

    CAS  Google Scholar 

  • Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4(12), 1244–1252.

    Article  Google Scholar 

  • Palmeri, T., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–304.

    Article  PubMed  CAS  Google Scholar 

  • Paradiso, M. (1988). A theory for the use of visual orientation information which exploits the columnar structure of striate cortex. Biological Cybernetics, 58, 35–49.

    Article  PubMed  CAS  Google Scholar 

  • Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.

    PubMed  CAS  Google Scholar 

  • Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.

    Article  Google Scholar 

  • Pouget, A., Zhang, K., Deneve, S., & Latham, P. (1998). Statistically efficient estimation using population coding. Neural Computation, 10, 373–401.

    Article  PubMed  CAS  Google Scholar 

  • Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes: Exploring the neural code. Cambridge: MIT Press.

    Google Scholar 

  • Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204.

    Article  PubMed  CAS  Google Scholar 

  • Schölkopf, B., Burges, C., & Smola, A. (Eds.) (1999). Advances in kernel methods—support vector learning. Cambridge: MIT Press.

    Google Scholar 

  • Seriès, P., Latham, P., & Pouget, A. (2004). Tuning curve sharpening for orientation selectivity: Coding efficiency and the impact of correlations. Nature Neuroscience, 7(10), 1129–1135.

    Article  PubMed  CAS  Google Scholar 

  • Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading neuronal population codes. Proceedings of the national academy of sciences of the United States of America, 90, 10749–10753.

    Article  PubMed  CAS  Google Scholar 

  • Sigala, N. (2004). Visual categorization and the inferior temporal cortex. Behavioural Brain Research, 149, 1–7.

    Article  PubMed  Google Scholar 

  • Sigala, N., & Logothetis, N. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.

    Article  PubMed  CAS  Google Scholar 

  • Softky, W., & Koch, C. (1993). The highly irregular firing of cortical cells is inconsistent with temporal integration of random epsps. The Journal of Neuroscience, 12(1), 334–350.

    Google Scholar 

  • Sompolinsky, H., Yoon, H., Kang, K., & Shamir, M. (2001). Population coding in neuronal systems with correlated noise. Physical Review E, 64(5), 051904.

    Article  CAS  Google Scholar 

  • Stein, R. (1967). The information capacity of nerve cells using a frequency code. Biophysical Journal, 7, 797–826.

    Article  PubMed  CAS  Google Scholar 

  • Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400, 869–873.

    Article  PubMed  CAS  Google Scholar 

  • Tanaka, K. (1996). Inferotemporal cortex and object vision. Annual Review of Neuroscience, 19, 109–139.

    Article  PubMed  CAS  Google Scholar 

  • Taube, J., Muller, R., &  Ranck, J. B. J. (1990). Head-direction cells recorded from the postsuiculum in freely moving rats. i. description and quantitative analysis. The Journal of Neuroscience, 10(2), 420–435.

    PubMed  CAS  Google Scholar 

  • Thomas, E., Hulle, M. V., & Vogels, R. (2001). Encoding of categories by noncategory-specific neurons in the inferior temporal cortex. Journal of Cognitive Neuroscience, 13(2), 190–200.

    Article  PubMed  CAS  Google Scholar 

  • Tolhurst, D., Movshon, J., & Dean, A. (1983). The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research, 23, 775–785.

    Article  PubMed  CAS  Google Scholar 

  • Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proc. 17th international conf. on machine learning (pp. 1015–1022). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Vogels, R. (1999). Categorization of complex visual images by rhesus monkeys. Part 2: Single-cells study. European Journal of Neuroscience, 11, 1239–1255.

    Article  PubMed  CAS  Google Scholar 

  • Vogels, R., & Orban, G. (1990). How well do response changes of striate neurons signal differences in orientation: A study in the discriminating monkey. The Journal of Neuroscience, 10(11), 3543–3558.

    PubMed  CAS  Google Scholar 

  • Wilson, M., & DeBauche, B. (1981). Inferotemporal cortex and categorical perception of visual stimuli by monkeys. Neuropsychologia, 19(1), 29–41.

    Article  PubMed  CAS  Google Scholar 

  • Yoon, H., & Sompolinsky, H. (1999). The effect of correlations on the fisher information of population codes. In M. Kearns, S. Solla, & D. Cohn (Eds.), Advances in neural information processing systems 11 (NIPS-11) (pp. 167–173). Cambridge: MIT Press.

    Google Scholar 

  • Young, M., & Yamane, S. (1992). Sparse population coding of faces in the inferotemporal cortex. Science, 256, 1327–1330.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This work is part of a project “Acqlang” supported by the French National Research Agency (ANR-05-BLAN-0065-01). LBG acknowledges a fellowship from the Délégation Générale pour l’Armement. JPN is a Centre National de la Recherche Scientifique member. The initial motivation for this work comes from (psycho- and neuro-) computational issues in the perception of phonemes: we thank Sharon Peperkamp and Janet Pierrehumbert for introducing us to this topic and for valuable discussions. LBG is grateful to the members of the Laboratoire de Sciences Cognitives et Psycholinguistique de l’ENS, especially to Emmanuel Dupoux, for numerous and stimulating discussions. We acknowledge useful inputs from the referees, and most especially, we thank one of them for a detailed list of constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurent Bonnasse-Gahot.

Additional information

Action Editor: Jonathan D. Victor

Appendices

Appendix A: Derivation of Eq. (23)

The goal of this appendix and the following one is to derive Eqs. (18), (23) and (28). Remark: given the simplicity and the underlying identity of these results, we do not expect our derivations to be the simplest ones.

When N goes to ∞, we expect the mutual information I(μ, r) to converge towards I(μ, x), and we are interested in the first non trivial correction to this asymptotic limit. We thus compute for large N the difference

$$\Delta \equiv I(\mu, \mathbf{x}) - I(\mu, \mathbf{r}) \; \geq 0.$$
(33)

One can write

$$\Delta = - \int d^K\mathbf{x} \, d^N\mathbf{r} \, P(\mathbf{r}|\mathbf{x}) \, \phi(\mathbf{x}|\mathbf{r})$$
(34)

where

$$\phi(\mathbf{x}|\mathbf{r}) \equiv \sum_{\mu=1}^M p(\mathbf{x}) P(\mu|\mathbf{x}) \ln \frac{P(\mu|\mathbf{r})}{P(\mu|\mathbf{x})}$$
(35)

We follow the same approach as in Brunel and Nadal (1998). The first step consists in integrating over x. Taking the large N limit, we show that the leading order of the right term of Eq. (34) is zero. We then seek for the first correction using Laplace/steepest descent method. The last step eventually consists in integrating over r.

We introduce G(r|x) defined as :

$$G(\mathbf{r}|\mathbf{x}) \equiv \frac{1}{N} \ln P(\mathbf{r}|\mathbf{x})$$
(36)

and assume that it has a single global maximum at x = x m (r). We can rewrite Eq. (34) in the following way:

$$\Delta = - \int d^K\mathbf{x} \, d^N\mathbf{r} \, e^{N G(\mathbf{r}|\mathbf{x})} \, \phi(\mathbf{x}|\mathbf{r})$$
(37)

Integration over x. In order to integrate Eq. (34) over x, let us first show that

$$\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0$$
(38)

We begin by evaluating

$$\begin{array}{*{20}c} P(\mu|\mathbf{r}) \!-\! P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) &=& \frac{P(\mathbf{r}|\mu) q_{\mu}}{P(\mathbf{r})} \!-\! P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \end{array} $$
(39)
$$\begin{array}{*{20}c} &=& \frac{1}{P(\mathbf{r})}\! \int{\kern-2pt} d^K\mathbf{x} \; \, \exp \big(N G(\mathbf{r}|\mathbf{x}) \big) \varphi_{\mu}(\mathbf{x}|\mathbf{r})\end{array}$$
(40)

with

$$ \varphi_{\mu}(\mathbf{x}|\mathbf{r}) = p(\mathbf{x}) \big[P(\mu|\mathbf{x}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \big] $$
(41)

By using saddle-point method and assuming that N ≫ 1 and K ≪ N, we find :

$$\begin{array}{*{20}l} &&{\kern-6.5pt}P(\mu|\mathbf{r}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\\ &&{\kern.8pc} =\frac{1}{P(\mathbf{r})} \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}}\\ &&{\kern1.7pc} \times \frac{1}{2} \operatorname{tr} \left( \mathbb{H}\big(\varphi_\mu(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right)\\ \end{array}$$
(42)

where ℍis the hessian matrix of ξ,

$$ \mathbb{H}_{kl}\big(\xi(\mathbf{x})\big) = \frac{\partial^2\xi(\mathbf{x})}{\partial x_k \partial x_l} $$
(43)

evaluated at x m (r).

Since

$$ P(\mathbf{r}) = p\big(\mathbf{x}_m(\mathbf{r})\big) \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}} $$
(44)

we have

$$ \begin{array}{*{20}l} &&{\kern-6pt} P(\mu|\mathbf{r}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\\ &&{\kern4pt} ={\kern-2pt} \frac{1}{2 p\big(\mathbf{x}_m(\mathbf{r})\big)}{\kern-1.5pt} \operatorname{tr}{\kern-2.5pt} \left({\kern-1pt} \mathbb{H}\big(\varphi_\mu(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-3pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}{\kern-1pt} \big|_{\mathbf{x}_m(\mathbf{r})}{\kern-1pt} \right)\\ \end{array}$$
(45)

As a result,

$$ \begin{array}{*{20}l} \phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) \\ &&{\kern.3pc} \equiv {\kern-3pt} \displaystyle\sum_{\mu=1}^M p \big(\mathbf{x}_m(\mathbf{r})\big) P\big(\mu| \mathbf{x}_m(\mathbf{r})\big) \!\ln \frac{P(\mu|\mathbf{r})}{P \big(\mu|\mathbf{x}_m(\mathbf{r})\big)} \end{array}$$
(46)
$$ \begin{array}{*{20}c} &&{\kern.3pc}={\kern-3pt} \sum_{\mu=1}^M{\kern-2pt} p \big(\mathbf{x}_m(\mathbf{r})\big)\! P\big(\mu| \mathbf{x}_m(\mathbf{r})\big) {\kern-2pt}\ln{\kern-3pt} \left({\kern-3pt} 1 {\kern-2pt}+{\kern-2pt} \frac{P(\mu|\mathbf{r}){\kern-2pt}-{\kern-2pt}P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)}{P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)}{\kern-2pt} \right)\end{array}$$
(47)

According to Eq. (45), \(P(\mu|\mathbf{r})-P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\) is of order 1/N, which entails, as ln (1 + z) ≈ z when z ≪ 1, that

$$ \begin{array}{*{20}l} \phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big)\\ &&{\kern.3pc} = \frac{1}{2} \sum_{\mu=1}^M \operatorname{tr} \left( \mathbb{H}\big(\varphi_\mu(\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) &&{\kern.3pc} = \frac{1}{2} {\kern-2pt}\operatorname{tr}{\kern-2pt} \left( \mathbb{H}\left(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x})\right)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-2pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) \end{array}$$
(48)
$$ \begin{array}{*{20}c} &&{\kern.3pc} = \frac{1}{2} {\kern-2pt}\operatorname{tr}{\kern-2pt} \left( \mathbb{H}\left(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x})\right)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-2pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) \end{array}$$
(49)

Now, as \(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x}) = 0\) we have demonstrated that \(\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0\).

We can return to Eq. (34), and apply saddle-point method knowing that \(\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0\). This leads to :

$$ \begin{array}{*{20}c} \Delta = &-& \int d^N\mathbf{r} \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}}\\ &\times& \frac{1}{2} \operatorname{tr} \left( \mathbb{H}\big(\phi(\mathbf{x})|\mathbf{r}\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right)\\ \end{array}$$
(50)

Recalling that \(P(\mu|\mathbf{r})\!-\!P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \!\sim\! O(\frac{1}{N})\), it is straightforward to show that :

$$ \begin{array}{*{20}l}&& {\kern-6.5pt} \frac{1}{p\big(\mathbf{x}_m(\mathbf{r})\big)} \frac{\partial \phi(\mathbf{x}|\mathbf{r}) }{\partial x_k \partial x_l} \Big|_{\mathbf{x}_m(\mathbf{r})}\\ &&{\kern.3pc} = - \sum_{\mu=1}^M \frac{1}{ P\big(\mu| \mathbf{x}_m(\mathbf{r})\big)} \frac{\partial P(\mu|\mathbf{x})}{\partial x_k} \Big|_{\mathbf{x}_m(\mathbf{r})} \frac{\partial P(\mu|\mathbf{x})}{\partial x_l} \Big|_{\mathbf{x}_m(\mathbf{r})} \end{array}$$
(51)

ie

$$ \begin{array}{*{20}l} &&{\kern-6.5pt} \frac{1}{p\big(\mathbf{x}_m(\mathbf{r})\big)} \mathbb{H}\big(\phi(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})}\\ &&{\kern.4pc} = - \sum_{\mu=1}^M \frac{1}{ P\big(\mu| \mathbf{x}_m(\mathbf{r})\big)} \nabla P(\mu|\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})} \nabla P(\mu|\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})}^{\top} \\ \end{array}$$
(52)

where \(\nabla \xi(\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})}\) is the column vector of the partial derivatives of ξ

$$ \nabla \xi(\mathbf{x}) = (\ldots, \partial\xi(\mathbf{x})/\partial x_k ,\ldots)^{\top} $$
(53)

evaluated at x m (r).

Putting Eqs. (44), (50) and (52) together eventually leads to:

$$ \Delta = \frac{1}{2} \int d^N\mathbf{r} P(\mathbf{r}) \left(\sum_{\mu=1}^M \frac{1}{ P(\mu| \mathbf{x})} \nabla P(\mu|\mathbf{x}) \nabla P(\mu|\mathbf{x})^{\top} \right) \bigg|_{\mathbf{x}_m(\mathbf{r})} : \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\Big|_{\mathbf{x}_m(\mathbf{r})} $$
(54)

where ‘:’ denotes the Frobenius inner product on \(\mathcal{M}_K(\mathbb{R})\), defined as follows:

$$ \forall A,B \in \mathcal{M}_K(\mathbb{R}), \; A:B = tr(A^{\top}B) = \sum_{k,l} A_{kl}B_{kl}. $$
(55)

Integration over r. By proceeding as Brunel and Nadal (1998, pp.1753–1754) in order to integrate over r, we get:

$$ \begin{array}{*{20}l} &&{\kern-6.5pt} I(\mu,\mathbf{x}) - I(\mu,\mathbf{r})\\ &&{\kern-6pt} = {\kern-2pt}\frac{1}{2}{\kern-2pt} \int{\kern-3pt} d^K\mathbf{x}\; p(\mathbf{x}){\kern2.5pt} {\kern-4pt} \left[{\kern-2pt} \sum_{\mu=1}^M{\kern-2pt} \frac{1}{P(\mu|\mathbf{x})} {\kern-2pt}\nabla{\kern-2pt} P(\mu|\mathbf{x}) \nabla P(\mu|\mathbf{x})^{\top} {\kern-2pt} \right]{\kern-4pt} :{\kern-1pt} F_{\text{code}}^{-1}(\mathbf{x})\\ \end{array}$$
(56)

where \(F_{\text{code}}(\mathbf{x})\) is the K×K Fisher information matrix of the neuronal population,

$$ \big[F_{\text{code}}(\mathbf{x})\big]_{kl} \;{\kern1.5pt} = {\kern1.5pt}\;- \;\int d^N\mathbf{r} \;P(\mathbf{r}|\mathbf{x}) {\kern3pt} \frac{\partial^2 \ln P(\mathbf{r}|\mathbf{x}) }{\partial x_k \partial x_l} \label{app_fisher_code_matrix} $$
(57)

Noticing that

$$ \sum_{\mu=1}^M{\kern-2pt} \frac{1}{P(\mu|\mathbf{x})} \frac{\partial P(\mu|\mathbf{x})}{\partial x_k} \frac{\partial P(\mu|\mathbf{x})}{\partial x_l} {\kern-2pt}= -{\kern-2pt}\sum_{\mu=1}^M{\kern-2pt} P(\mu|\mathbf{x}) \frac{\partial^2 \ln P(\mu|\mathbf{x}) }{\partial x_k \partial x_l} $$
(58)

we can introduce the K×K Fisher information matrix of the categories, \(F_{\text{cat}}(\mathbf{x})\), characterizing the sensitivity of μ with respect to small variations of x:

$$ \big[F_{\text{cat}}(\mathbf{x})\big]_{kl}{\kern1.5pt} \; = -\sum_{\mu=1}^M P(\mu|\mathbf{x}) \frac{\partial^2 \ln P(\mu|\mathbf{x})}{\partial x_k \partial x_l} $$
(59)

This eventually leads to Eq. (23):

$$ I(\mu,\mathbf{x}) - I(\mu,\mathbf{r}) = \frac{1}{2} \int d^K\mathbf{x} \, p(\mathbf{x}) \, F_{\text{cat}}^{}(\mathbf{x}):F_{\text{code}}^{-1}(\mathbf{x}) $$
(60)

Since \(F_{\text{code}}\) is of order N, one has in particular that I(μ,x) − I(μ,r) is of order 1/N.

Appendix B: Derivation of Eq. (28)

Each cell has an activity r i equal to 1 if x is in [θ i , θ i + 1], and 0 otherwise. The width of the ith tuning curves is thus a i  = θ i + 1 − θ i , and we define the preferred stimuli as the center of the receptive fields, x i  ≡ (θ i  + θ i + 1)/2.

We want to compute

$$ \Delta \equiv I(\mu,x)-I(\mu,\mathbf{r}) = {\cal H}(\mu | \mathbf{r}) - {\cal H}(\mu | x) $$
(61)

where

$${\cal H}(\mu | x) = - \int dx {\kern1.5pt} \; p(x) \; \displaystyle\sum_{\mu=1}^M \; P(\mu|x) \ln P{(\mu|x)}; $$
$${\cal H}(\mu | \mathbf{r}) = - \int d^N\mathbf{r}{\kern1.5pt} \; P(\mathbf{r}) \; \displaystyle\sum_{\mu=1}^M \; Q(\mu | \mathbf{r}) \ln Q(\mu | \mathbf{r}) $$

in terms of the { θ i , i = 1,...,N + 1}, and to see what is the optimal choice of the { θ i , i = 1,...,N + 1}, or equivalently of the {x i , i = 1,...,N}, for having this difference Δ as small as possible for a large but finite value of N.

If we set:

$$ \widetilde{P}_i \equiv \int_{\theta_{i}}^{\theta_{i+1}} dx {\kern1.5pt} \: p(x) $$
(62)
$$\widetilde{P}_i(\mu) \equiv \int_{\theta_{i}}^{\theta_{i+1}} dx {\kern1.5pt} \: p(x |\mu) $$
(63)
$$\widetilde{Q}_{i,\mu} \equiv P(\mu | r_i=1) = \frac{\widetilde{P}_i(\mu) \, q_{\mu}}{\widetilde{ P}_i} $$
(64)
$$Q_{\mu}(x) \equiv P(\mu | x) $$
(65)

then we can write Δ from Eq. (61) as :

$$ \Delta = \sum_{i} \, \Delta_i $$
(66)

with

$$ \Delta_i = \int_{\theta_i}^{\theta_{i+1}} dx {\kern3pt} p(x) {\kern3pt} \Big[ \mathcal{H}(\{ \widetilde{Q}_{i,\mu}\}) - \mathcal{H}(\{ Q_{\mu}(x)\}) \Big] $$
(67)

where \(\mathcal{H}(\{Q_{\mu}\}_{\mu=1}^M )\) is the mixing entropy:

$$ \mathcal{H}\left(\{Q_{\mu}\}_{\mu=1}^M\right) \; = \; - \; \displaystyle\sum_{\mu=1}^M \, Q_{\mu} \ln Q_{\mu} $$
(68)

The quantity \(\mathcal{H}(\{ Q_{\mu}(x)\})\) is zero on a homogeneous domain, hence there is no contribution from intervals included in such domain. Suppose on the contrary that on the full range under consideration \(\mathcal{H}(\{ Q_{\mu}(x)\})\) is rapidly varying. We expect then the optimal θ i ’s distribution to be dense, that is a i  = θ i + 1 − θ i small.

Recalling that x i  =  (θ i + 1 + θ i )/2 and assuming a i  ≪ 1,

$$ \begin{array}{*{20}c} \widetilde{P}_i &=& \int_{-a_i/2}^{a_i/2} dz \left[p(x_i) + z p'(x_i) + \frac{z^2}{2} p''(x_i) \right] \\ &=& a_i \; p(x_i) + \frac{ a_i^3}{24} \; p''(x_i). \end{array}$$

Likewise, \( \widetilde{P}_i (\mu) = a_i \; p(x_i|\mu) + a_i^3/24 \; p''(x_i|\mu) \). Thus,

$$\widetilde{Q}_{i,\mu} = Q_{\mu}(x_i) + \frac{a_i^2}{24} A_{i,\mu} $$
(69)

with

$$ A_{i,\mu} = Q_{\mu}''(x_i) + 2 \frac{p'(x_i)}{p(x_i)} \; Q_{\mu}'(x_i) $$
(70)

A Taylor expansion then gives:

$$ \begin{array}{*{20}c} &&{\kern-6pt} \mathcal{H}\big({\kern-.2pt}\{\widetilde{Q}_{i,\mu}\}{\kern-.2pt}\big){\kern-3pt} ={\kern-3pt} \mathcal{H}\big({\kern-.2pt} \{Q_{\mu}(x_i)\}{\kern-.2pt}\big) {\kern-3pt}+{\kern-3pt} \nabla \mathcal{H}\big({\kern-.2pt}\{Q_{\mu}(x_i)\}{\kern-.2pt}\big){\kern-2pt} \cdot{\kern-4pt} \left( {\kern-4pt}\cdots{\kern-2pt} \frac{a_i^2 A_{i,\mu}}{24}{\kern-1pt} \cdots{\kern-4pt} \right)^{{\kern-2pt}\top} \\ &&+ \frac{1}{2} \left( \; \cdots {\kern3pt} \frac{a_i^2 A_{i,\mu}}{24} \; \cdots \right) \mathbb{H}_{\mathcal{H}}\big( \{Q_{\mu}(x_i)\} \big) \left( \; \cdots {\kern3pt} \frac{a_i^2 A_{i,\mu}}{24} \; \cdots \right)^{\top} \end{array}$$
(71)

where \(\nabla \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big)\) and ℍare respectively the gradient and the hessian of \(\mathcal{H}\) evaluated in {Q μ (x i )} (see (53) and (43). A second order expansion leads to:

$$ \mathcal{H}\big(\{\widetilde{Q}_{i,\mu}\}\big) = \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) - \frac{a_i^2}{24} \sum_{\mu=1}^M A_{i,\mu} (\ln Q_{\mu}(x_i) + 1) $$
(72)

Thus we have:

$$ \begin{array}{*{20}l} &&{\kern-8pt}\int_{\theta_i}^{\theta_{i+1}} dx {\kern1.5pt} p(x) {\kern3pt} \mathcal{H}\big(\{\widetilde{Q}_{i,\mu}\}\big)\\ &&{\kern.6pc} = a_i p(x_i) \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) + \frac{a_i^3}{24} \left(p''(x_i) \vphantom{\sum_{\mu=1}^M A_{i,\mu}} \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) \right.\\ &&{\kern1.5pc} \left.- p(x_i) \sum_{\mu=1}^M A_{i,\mu} (\ln Q_{\mu}(x_i) + 1) \right) \end{array}$$
(73)

Moreover,

$$ \begin{array}{*{20}l} &&{\kern-8pt}\int_{\theta_i}^{\theta_{i+1}} dx {\kern1.5pt} p(x) {\kern3pt} \mathcal{H}\big(\{Q_{\mu}(x)\}\big)\\ &&{\kern.6pc} = a_i p(x_i)\mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) + \frac{a_i^3}{24} \frac{\partial^2 \; p(x) \, \mathcal{H}\big(\{Q_{\mu}(x)\}\big)}{\partial x^2} \Big|_{x_i} \\ \end{array}$$
(74)

with

$$ \begin{array}{*{20}l} &&{\kern-7.5pt} \frac{\partial^2 \; p(x) \mathcal{H}\big(\{Q_{\mu}(x)\}\big)}{\partial x^2} \Big|_{x_i}\\ &&{\kern.2pc} = - 2 p'(x_i) \sum_{\mu=1}^M Q_{\mu}'(x_i) (\ln Q_{\mu}(x_i) + 1) \\ &&{\kern1.15pc} - p(x_i) \left(\sum_{\mu=1}^M \left\{{\kern-1.5pt} Q_{\mu}''(x_i) (\ln Q_{\mu}(x_i) + 1) + \frac{Q_{\mu}'(x_i)^{2}}{Q_{\mu}(x_i)} {\kern-2pt} \right\}{\kern-3pt} \right) \\ &&{\kern1.15pc} - p''(x_i) \sum_{\mu=1}^M Q_{\mu}(x_i) \ln Q_{\mu}(x_i) \end{array}$$
(75)

Putting Eqs. (73), (74) and (75) together eventually leads to :

$$ \Delta_i = \frac{a_i^3}{24} \;p(x_i) \sum_{\mu=1}^{M} {\kern3pt}\frac{P'(\mu |x_i)^2}{P(\mu |x_i)} $$
(76)

hence

$$ \Delta = \sum_i \frac{a_i^3}{24} \;p(x_i) \; F_{\text{cat}}(x_i) $$
(77)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonnasse-Gahot, L., Nadal, JP. Neural coding of categories: information efficiency and optimal population codes. J Comput Neurosci 25, 169–187 (2008). https://doi.org/10.1007/s10827-007-0071-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-007-0071-5

Keywords

Navigation