Neural coding of categories: information efficiency and optimal population codes

Bonnasse-Gahot, Laurent; Nadal, Jean-Pierre

doi:10.1007/s10827-007-0071-5

Neural coding of categories: information efficiency and optimal population codes

Published: 31 January 2008

Volume 25, pages 169–187, (2008)
Cite this article

Journal of Computational Neuroscience Aims and scope Submit manuscript

Laurent Bonnasse-Gahot¹ &
Jean-Pierre Nadal^1,2

483 Accesses
17 Citations
Explore all metrics

Abstract

This paper deals with the analytical study of coding a discrete set of categories by a large assembly of neurons. We consider population coding schemes, which can also be seen as instances of exemplar models proposed in the literature to account for phenomena in the psychophysics of categorization. We quantify the coding efficiency by the mutual information between the set of categories and the neural code, and we characterize the properties of the most efficient codes, considering different regimes corresponding essentially to different signal-to-noise ratio. One main outcome is to find that, in a high signal-to-noise ratio limit, the Fisher information at the population level should be the greatest between categories, which is achieved by having many cells with the stimulus-discriminating parts (steepest slope) of their tuning curves placed in the transition regions between categories in stimulus space. We show that these properties are in good agreement with both psychophysical data and with the neurophysiology of the inferotemporal cortex in the monkey, a cortex area known to be specifically involved in classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional geometry of population responses in visual cortex

Article 26 June 2019

Pseudosparse neural coding in the visual system of primates

Article Open access 08 January 2021

The impact of training methodology and category structure on the formation of new categories from existing knowledge

Article 27 October 2018

References

Abbott, L., & Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11, 91–101.
Article PubMed CAS Google Scholar
Abramson, A., & Lisker, L. (1970). Discriminability along the voicing continuum: Cross-language tests. In Proceedings of the sixth international congress of phonetic sciences. Prague: Academia.
Google Scholar
Ashby, F., & Spiering, B. (2004). The neurobiology of category learning. Behavioral and Cognitive Neuroscience Reviews, 3(2), 101–113.
Article PubMed Google Scholar
Averbeck, B., Latham, P., & Pouget, A. (2006). Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7, 358–366.
Article PubMed CAS Google Scholar
Blahut, R. E. (1987). Principles and practice of information theory. Boston, MA: Addison-Wesley Longman.
Google Scholar
Brunel, N., & Nadal, J.-P. (1998). Mutual information, fisher information, and population coding. Neural Computation, 10, 1731–1757.
Article PubMed CAS Google Scholar
Butts, D. A., & Goldman, M. S. (2006). Tuning curves, neuronal variability, and sensory coding. PLoS Biology, 4(4), e92.
Article CAS Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Google Scholar
Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). New York: Wiley.
Google Scholar
Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge: MIT Press.
Google Scholar
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
Google Scholar
Fisher, J., & Principe, J. (1998). A methodology for information theoretic feature extraction. In A. Stuberud (Ed.), Proceedings of the IEEE international joint conference on neural networks. Piscataway: IEEE.
Google Scholar
Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316.
Article PubMed CAS Google Scholar
Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2003). A comparison of primate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 15, 5235–5246.
Google Scholar
Georgopoulos, A., Schwartz, A., & Kettner, R. (1986). Neuronal population coding of movement direction. Science, 233, 1416–1419.
Article PubMed CAS Google Scholar
Goldstone, R. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123(2), 178–200.
Article CAS Google Scholar
Green, D., & Swets, J. (1988). Signal detection theory and psychophysics, reprint edition. Los Altos, CA: Peninsula.
Google Scholar
Guenther, F., Husain, F., Cohen, M., & Shinn-Cunningham, B. (1999). Effects of categorization and discrimination training on auditory perceptual space. Journal of the Acoustical Society of America, 106, 2900–2912.
Article PubMed CAS Google Scholar
Han, Y., Köver, H., Insanally, M., Semerdjian, J., & Bao, S. (2007). Early experience impairs perceptual discrimination. Nature Neuroscience, 20(9), 1191–1197.
Article CAS Google Scholar
Harnad, S. (Ed.) (1987). Categorical perception: The groundwork of cognition. New York: Cambridge University Press.
Google Scholar
Harnad, S. (2005). Cognition is categorization. In H. Cohen & C. Lefebvre (Eds.), Handbook of categorization. Amsterdam: Elsevier.
Google Scholar
Hillenbrand, J., Getty, L., Clark, M., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(5), 3099–3111.
Article PubMed CAS Google Scholar
Hintzman, D. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93(4), 411–428.
Article Google Scholar
Humphreys, G., & Forde, E. (2001). Hierarchies, similarity and interactivity in object recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain Sciences, 24, 453–509.
PubMed CAS Google Scholar
Hung, C., Kreiman, G., Poggio, T., & DiCarlo, J. (2005). Fast readout of object identity from macaque inferior temporal cortex. Science, 310, 863–866.
Article PubMed CAS Google Scholar
Jiang, X., Bradley, E., Rini, R., Zeffiro, T., VanMeter, J., & Riesenhuber, M. (2007). Categorization training results in shape- and category-selective human neural plasticity. Neuron, 53, 891–903.
Article PubMed CAS Google Scholar
Kang, K., Shapley, R., & Sompolinsky, H. (2004). Information tuning of populations of neurons in primary visual cortex. Journal of Neuroscience, 24(13), 3726–3735.
Article PubMed CAS Google Scholar
Kang, K., & Sompolinsky, H. (2001). Mutual information of population codes and distance measures in probability space. Physical Review Letters, 86(21), 4958–4961.
Article PubMed CAS Google Scholar
Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97, 4296–4309.
Article PubMed Google Scholar
Knoblich, U., Freedman, D., & Riesenhuber, M. (2002). Categorization in it and pfc: Model and experiments. AI Memo 2002-007. Cambridge, MA: MIT AI Laboratory.
Google Scholar
Kobatake, E., Wang, G., & Tanaka, K. (1998). Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. Journal of Neurophysiology, 80, 324–330.
PubMed CAS Google Scholar
Koida, K., & Komatsu, H. (2007). Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex. Nature Neuroscience, 10(1), 108–116.
Article PubMed CAS Google Scholar
Kruschke, J. (1992). Alcove: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44.
Article PubMed CAS Google Scholar
Kuhl, P. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception & Psychophysics, 50(2), 93–107.
CAS Google Scholar
Kuhl, P., & Padden, D. (1983). Enhanced discriminability at the phonetic boundaries for the place feature in macaques. Journal of the Acoustical Society of America, 73(3), 1003–1010.
Article PubMed CAS Google Scholar
Li, W., Piech, V., & Gilbert, C. (2004). Perceptual learning and top-down influences in primary visual cortex. Nature Neuroscience, 7(6), 651–658.
Article PubMed CAS Google Scholar
Liberman, A., Harris, K., Hoffman, H., & Griffith, B. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54, 358–369.
Article PubMed CAS Google Scholar
Livingston, K., Andrews, J., & Harnad, S. (1998). Categorical perception effects induced by category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 24(3), 732–753.
Article CAS Google Scholar
Logothetis, N., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5(5), 552–563.
Article PubMed CAS Google Scholar
Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.
Google Scholar
Nadal, J.-P. (1994). Formal neural networks: From supervised to unsupervised learning. In E. Goles & S. Martinez (Eds.), Cellular automata, dynamical systems and neural networks. Mathematics and its applications (Vol. 282, pp. 147–166). Norwell: Kluwer.
Google Scholar
Nosofsky, R. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology, 115(1), 39–57.
CAS Google Scholar
Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4(12), 1244–1252.
Article Google Scholar
Palmeri, T., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–304.
Article PubMed CAS Google Scholar
Paradiso, M. (1988). A theory for the use of visual orientation information which exploits the columnar structure of striate cortex. Biological Cybernetics, 58, 35–49.
Article PubMed CAS Google Scholar
Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.
PubMed CAS Google Scholar
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Article Google Scholar
Pouget, A., Zhang, K., Deneve, S., & Latham, P. (1998). Statistically efficient estimation using population coding. Neural Computation, 10, 373–401.
Article PubMed CAS Google Scholar
Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes: Exploring the neural code. Cambridge: MIT Press.
Google Scholar
Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204.
Article PubMed CAS Google Scholar
Schölkopf, B., Burges, C., & Smola, A. (Eds.) (1999). Advances in kernel methods—support vector learning. Cambridge: MIT Press.
Google Scholar
Seriès, P., Latham, P., & Pouget, A. (2004). Tuning curve sharpening for orientation selectivity: Coding efficiency and the impact of correlations. Nature Neuroscience, 7(10), 1129–1135.
Article PubMed CAS Google Scholar
Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading neuronal population codes. Proceedings of the national academy of sciences of the United States of America, 90, 10749–10753.
Article PubMed CAS Google Scholar
Sigala, N. (2004). Visual categorization and the inferior temporal cortex. Behavioural Brain Research, 149, 1–7.
Article PubMed Google Scholar
Sigala, N., & Logothetis, N. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.
Article PubMed CAS Google Scholar
Softky, W., & Koch, C. (1993). The highly irregular firing of cortical cells is inconsistent with temporal integration of random epsps. The Journal of Neuroscience, 12(1), 334–350.
Google Scholar
Sompolinsky, H., Yoon, H., Kang, K., & Shamir, M. (2001). Population coding in neuronal systems with correlated noise. Physical Review E, 64(5), 051904.
Article CAS Google Scholar
Stein, R. (1967). The information capacity of nerve cells using a frequency code. Biophysical Journal, 7, 797–826.
Article PubMed CAS Google Scholar
Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400, 869–873.
Article PubMed CAS Google Scholar
Tanaka, K. (1996). Inferotemporal cortex and object vision. Annual Review of Neuroscience, 19, 109–139.
Article PubMed CAS Google Scholar
Taube, J., Muller, R., & Ranck, J. B. J. (1990). Head-direction cells recorded from the postsuiculum in freely moving rats. i. description and quantitative analysis. The Journal of Neuroscience, 10(2), 420–435.
PubMed CAS Google Scholar
Thomas, E., Hulle, M. V., & Vogels, R. (2001). Encoding of categories by noncategory-specific neurons in the inferior temporal cortex. Journal of Cognitive Neuroscience, 13(2), 190–200.
Article PubMed CAS Google Scholar
Tolhurst, D., Movshon, J., & Dean, A. (1983). The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research, 23, 775–785.
Article PubMed CAS Google Scholar
Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proc. 17th international conf. on machine learning (pp. 1015–1022). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Vogels, R. (1999). Categorization of complex visual images by rhesus monkeys. Part 2: Single-cells study. European Journal of Neuroscience, 11, 1239–1255.
Article PubMed CAS Google Scholar
Vogels, R., & Orban, G. (1990). How well do response changes of striate neurons signal differences in orientation: A study in the discriminating monkey. The Journal of Neuroscience, 10(11), 3543–3558.
PubMed CAS Google Scholar
Wilson, M., & DeBauche, B. (1981). Inferotemporal cortex and categorical perception of visual stimuli by monkeys. Neuropsychologia, 19(1), 29–41.
Article PubMed CAS Google Scholar
Yoon, H., & Sompolinsky, H. (1999). The effect of correlations on the fisher information of population codes. In M. Kearns, S. Solla, & D. Cohn (Eds.), Advances in neural information processing systems 11 (NIPS-11) (pp. 167–173). Cambridge: MIT Press.
Google Scholar
Young, M., & Yamane, S. (1992). Sparse population coding of faces in the inferotemporal cortex. Science, 256, 1327–1330.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This work is part of a project “Acqlang” supported by the French National Research Agency (ANR-05-BLAN-0065-01). LBG acknowledges a fellowship from the Délégation Générale pour l’Armement. JPN is a Centre National de la Recherche Scientifique member. The initial motivation for this work comes from (psycho- and neuro-) computational issues in the perception of phonemes: we thank Sharon Peperkamp and Janet Pierrehumbert for introducing us to this topic and for valuable discussions. LBG is grateful to the members of the Laboratoire de Sciences Cognitives et Psycholinguistique de l’ENS, especially to Emmanuel Dupoux, for numerous and stimulating discussions. We acknowledge useful inputs from the referees, and most especially, we thank one of them for a detailed list of constructive comments.

Author information

Authors and Affiliations

Centre d’Analyse et de Mathématique Sociales (CAMS, UMR 8557 CNRS-EHESS), Ecole des Hautes Etudes en Sciences Sociales, 54 bd. Raspail, 75270, Paris Cedex 06, France
Laurent Bonnasse-Gahot & Jean-Pierre Nadal
Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS-ENS-Paris 6-Paris 7), Ecole Normale Supérieure, 24 rue Lhomond, 75231, Paris Cedex 05, France
Jean-Pierre Nadal

Authors

Laurent Bonnasse-Gahot
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Nadal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurent Bonnasse-Gahot.

Additional information

Action Editor: Jonathan D. Victor

Appendices

Appendix A: Derivation of Eq. (23)

The goal of this appendix and the following one is to derive Eqs. (18), (23) and (28). Remark: given the simplicity and the underlying identity of these results, we do not expect our derivations to be the simplest ones.

When N goes to ∞, we expect the mutual information I(μ, r) to converge towards I(μ, x), and we are interested in the first non trivial correction to this asymptotic limit. We thus compute for large N the difference

$$\Delta \equiv I(\mu, \mathbf{x}) - I(\mu, \mathbf{r}) \; \geq 0.$$

(33)

One can write

$$\Delta = - \int d^K\mathbf{x} \, d^N\mathbf{r} \, P(\mathbf{r}|\mathbf{x}) \, \phi(\mathbf{x}|\mathbf{r})$$

(34)

where

$$\phi(\mathbf{x}|\mathbf{r}) \equiv \sum_{\mu=1}^M p(\mathbf{x}) P(\mu|\mathbf{x}) \ln \frac{P(\mu|\mathbf{r})}{P(\mu|\mathbf{x})}$$

(35)

We follow the same approach as in Brunel and Nadal (1998). The first step consists in integrating over x. Taking the large N limit, we show that the leading order of the right term of Eq. (34) is zero. We then seek for the first correction using Laplace/steepest descent method. The last step eventually consists in integrating over r.

We introduce G(r|x) defined as :

$$G(\mathbf{r}|\mathbf{x}) \equiv \frac{1}{N} \ln P(\mathbf{r}|\mathbf{x})$$

(36)

and assume that it has a single global maximum at x = x _m(r). We can rewrite Eq. (34) in the following way:

$$\Delta = - \int d^K\mathbf{x} \, d^N\mathbf{r} \, e^{N G(\mathbf{r}|\mathbf{x})} \, \phi(\mathbf{x}|\mathbf{r})$$

(37)

Integration over x. In order to integrate Eq. (34) over x, let us first show that

$$\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0$$

(38)

We begin by evaluating

$$\begin{array}{*{20}c} P(\mu|\mathbf{r}) \!-\! P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) &=& \frac{P(\mathbf{r}|\mu) q_{\mu}}{P(\mathbf{r})} \!-\! P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \end{array} $$

(39)

$$\begin{array}{*{20}c} &=& \frac{1}{P(\mathbf{r})}\! \int{\kern-2pt} d^K\mathbf{x} \; \, \exp \big(N G(\mathbf{r}|\mathbf{x}) \big) \varphi_{\mu}(\mathbf{x}|\mathbf{r})\end{array}$$

(40)

with

$$ \varphi_{\mu}(\mathbf{x}|\mathbf{r}) = p(\mathbf{x}) \big[P(\mu|\mathbf{x}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \big] $$

(41)

By using saddle-point method and assuming that N ≫ 1 and K ≪ N, we find :

$$\begin{array}{*{20}l} &&{\kern-6.5pt}P(\mu|\mathbf{r}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\\ &&{\kern.8pc} =\frac{1}{P(\mathbf{r})} \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}}\\ &&{\kern1.7pc} \times \frac{1}{2} \operatorname{tr} \left( \mathbb{H}\big(\varphi_\mu(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right)\\ \end{array}$$

(42)

where ℍis the hessian matrix of ξ,

$$ \mathbb{H}_{kl}\big(\xi(\mathbf{x})\big) = \frac{\partial^2\xi(\mathbf{x})}{\partial x_k \partial x_l} $$

(43)

evaluated at x _m(r).

Since

$$ P(\mathbf{r}) = p\big(\mathbf{x}_m(\mathbf{r})\big) \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}} $$

(44)

we have

$$ \begin{array}{*{20}l} &&{\kern-6pt} P(\mu|\mathbf{r}) - P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\\ &&{\kern4pt} ={\kern-2pt} \frac{1}{2 p\big(\mathbf{x}_m(\mathbf{r})\big)}{\kern-1.5pt} \operatorname{tr}{\kern-2.5pt} \left({\kern-1pt} \mathbb{H}\big(\varphi_\mu(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-3pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}{\kern-1pt} \big|_{\mathbf{x}_m(\mathbf{r})}{\kern-1pt} \right)\\ \end{array}$$

(45)

As a result,

$$ \begin{array}{*{20}l} \phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) \\ &&{\kern.3pc} \equiv {\kern-3pt} \displaystyle\sum_{\mu=1}^M p \big(\mathbf{x}_m(\mathbf{r})\big) P\big(\mu| \mathbf{x}_m(\mathbf{r})\big) \!\ln \frac{P(\mu|\mathbf{r})}{P \big(\mu|\mathbf{x}_m(\mathbf{r})\big)} \end{array}$$

(46)

$$ \begin{array}{*{20}c} &&{\kern.3pc}={\kern-3pt} \sum_{\mu=1}^M{\kern-2pt} p \big(\mathbf{x}_m(\mathbf{r})\big)\! P\big(\mu| \mathbf{x}_m(\mathbf{r})\big) {\kern-2pt}\ln{\kern-3pt} \left({\kern-3pt} 1 {\kern-2pt}+{\kern-2pt} \frac{P(\mu|\mathbf{r}){\kern-2pt}-{\kern-2pt}P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)}{P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)}{\kern-2pt} \right)\end{array}$$

(47)

According to Eq. (45), $P(\mu|\mathbf{r})-P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)$ is of order 1/N, which entails, as ln (1 + z) ≈ z when z ≪ 1, that

$$ \begin{array}{*{20}l} \phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big)\\ &&{\kern.3pc} = \frac{1}{2} \sum_{\mu=1}^M \operatorname{tr} \left( \mathbb{H}\big(\varphi_\mu(\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) &&{\kern.3pc} = \frac{1}{2} {\kern-2pt}\operatorname{tr}{\kern-2pt} \left( \mathbb{H}\left(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x})\right)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-2pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) \end{array}$$

(48)

$$ \begin{array}{*{20}c} &&{\kern.3pc} = \frac{1}{2} {\kern-2pt}\operatorname{tr}{\kern-2pt} \left( \mathbb{H}\left(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x})\right)\big|_{\mathbf{x}_m(\mathbf{r})} \mathbb{H}\big({\kern-2pt}-{\kern-2pt}NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right) \end{array}$$

(49)

Now, as $\sum_{\mu=1}^M \varphi_\mu(\mathbf{x}) = 0$ we have demonstrated that $\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0$.

We can return to Eq. (34), and apply saddle-point method knowing that $\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0$. This leads to :

$$ \begin{array}{*{20}c} \Delta = &-& \int d^N\mathbf{r} \exp \big(N G_m(\mathbf{r})\big) \sqrt{\frac{(2\pi)^K}{\det \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)\big|_{\mathbf{x}_m(\mathbf{r})}}}\\ &\times& \frac{1}{2} \operatorname{tr} \left( \mathbb{H}\big(\phi(\mathbf{x})|\mathbf{r}\big)\big|_{\mathbf{x}_m(\mathbf{r})} {\kern3pt} \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\big|_{\mathbf{x}_m(\mathbf{r})} \right)\\ \end{array}$$

(50)

Recalling that $P(\mu|\mathbf{r})\!-\!P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \!\sim\! O(\frac{1}{N})$, it is straightforward to show that :

$$ \begin{array}{*{20}l}&& {\kern-6.5pt} \frac{1}{p\big(\mathbf{x}_m(\mathbf{r})\big)} \frac{\partial \phi(\mathbf{x}|\mathbf{r}) }{\partial x_k \partial x_l} \Big|_{\mathbf{x}_m(\mathbf{r})}\\ &&{\kern.3pc} = - \sum_{\mu=1}^M \frac{1}{ P\big(\mu| \mathbf{x}_m(\mathbf{r})\big)} \frac{\partial P(\mu|\mathbf{x})}{\partial x_k} \Big|_{\mathbf{x}_m(\mathbf{r})} \frac{\partial P(\mu|\mathbf{x})}{\partial x_l} \Big|_{\mathbf{x}_m(\mathbf{r})} \end{array}$$

(51)

ie

$$ \begin{array}{*{20}l} &&{\kern-6.5pt} \frac{1}{p\big(\mathbf{x}_m(\mathbf{r})\big)} \mathbb{H}\big(\phi(\mathbf{x}|\mathbf{r})\big)\big|_{\mathbf{x}_m(\mathbf{r})}\\ &&{\kern.4pc} = - \sum_{\mu=1}^M \frac{1}{ P\big(\mu| \mathbf{x}_m(\mathbf{r})\big)} \nabla P(\mu|\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})} \nabla P(\mu|\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})}^{\top} \\ \end{array}$$

(52)

where $\nabla \xi(\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})}$ is the column vector of the partial derivatives of ξ

$$ \nabla \xi(\mathbf{x}) = (\ldots, \partial\xi(\mathbf{x})/\partial x_k ,\ldots)^{\top} $$

(53)

evaluated at x _m(r).

Putting Eqs. (44), (50) and (52) together eventually leads to:

$$ \Delta = \frac{1}{2} \int d^N\mathbf{r} P(\mathbf{r}) \left(\sum_{\mu=1}^M \frac{1}{ P(\mu| \mathbf{x})} \nabla P(\mu|\mathbf{x}) \nabla P(\mu|\mathbf{x})^{\top} \right) \bigg|_{\mathbf{x}_m(\mathbf{r})} : \mathbb{H}\big(-NG(\mathbf{r}|\mathbf{x})\big)^{-1}\Big|_{\mathbf{x}_m(\mathbf{r})} $$

(54)

where ‘:’ denotes the Frobenius inner product on $\mathcal{M}_K(\mathbb{R})$, defined as follows:

$$ \forall A,B \in \mathcal{M}_K(\mathbb{R}), \; A:B = tr(A^{\top}B) = \sum_{k,l} A_{kl}B_{kl}. $$

(55)

Integration over r. By proceeding as Brunel and Nadal (1998, pp.1753–1754) in order to integrate over r, we get:

$$ \begin{array}{*{20}l} &&{\kern-6.5pt} I(\mu,\mathbf{x}) - I(\mu,\mathbf{r})\\ &&{\kern-6pt} = {\kern-2pt}\frac{1}{2}{\kern-2pt} \int{\kern-3pt} d^K\mathbf{x}\; p(\mathbf{x}){\kern2.5pt} {\kern-4pt} \left[{\kern-2pt} \sum_{\mu=1}^M{\kern-2pt} \frac{1}{P(\mu|\mathbf{x})} {\kern-2pt}\nabla{\kern-2pt} P(\mu|\mathbf{x}) \nabla P(\mu|\mathbf{x})^{\top} {\kern-2pt} \right]{\kern-4pt} :{\kern-1pt} F_{\text{code}}^{-1}(\mathbf{x})\\ \end{array}$$

(56)

where $F_{\text{code}}(\mathbf{x})$ is the K×K Fisher information matrix of the neuronal population,

$$ \big[F_{\text{code}}(\mathbf{x})\big]_{kl} \;{\kern1.5pt} = {\kern1.5pt}\;- \;\int d^N\mathbf{r} \;P(\mathbf{r}|\mathbf{x}) {\kern3pt} \frac{\partial^2 \ln P(\mathbf{r}|\mathbf{x}) }{\partial x_k \partial x_l} \label{app_fisher_code_matrix} $$

(57)

Noticing that

$$ \sum_{\mu=1}^M{\kern-2pt} \frac{1}{P(\mu|\mathbf{x})} \frac{\partial P(\mu|\mathbf{x})}{\partial x_k} \frac{\partial P(\mu|\mathbf{x})}{\partial x_l} {\kern-2pt}= -{\kern-2pt}\sum_{\mu=1}^M{\kern-2pt} P(\mu|\mathbf{x}) \frac{\partial^2 \ln P(\mu|\mathbf{x}) }{\partial x_k \partial x_l} $$

(58)

we can introduce the K×K Fisher information matrix of the categories, $F_{\text{cat}}(\mathbf{x})$, characterizing the sensitivity of μ with respect to small variations of x:

$$ \big[F_{\text{cat}}(\mathbf{x})\big]_{kl}{\kern1.5pt} \; = -\sum_{\mu=1}^M P(\mu|\mathbf{x}) \frac{\partial^2 \ln P(\mu|\mathbf{x})}{\partial x_k \partial x_l} $$

(59)

This eventually leads to Eq. (23):

$$ I(\mu,\mathbf{x}) - I(\mu,\mathbf{r}) = \frac{1}{2} \int d^K\mathbf{x} \, p(\mathbf{x}) \, F_{\text{cat}}^{}(\mathbf{x}):F_{\text{code}}^{-1}(\mathbf{x}) $$

(60)

Since $F_{\text{code}}$ is of order N, one has in particular that I(μ,x) − I(μ,r) is of order 1/N.

Appendix B: Derivation of Eq. (28)

Each cell has an activity r _i equal to 1 if x is in [θ _i, θ _i + 1], and 0 otherwise. The width of the ith tuning curves is thus a _i = θ _i + 1 − θ _i , and we define the preferred stimuli as the center of the receptive fields, x _i ≡ (θ _i + θ _i + 1)/2.

We want to compute

$$ \Delta \equiv I(\mu,x)-I(\mu,\mathbf{r}) = {\cal H}(\mu | \mathbf{r}) - {\cal H}(\mu | x) $$

(61)

where

$${\cal H}(\mu | x) = - \int dx {\kern1.5pt} \; p(x) \; \displaystyle\sum_{\mu=1}^M \; P(\mu|x) \ln P{(\mu|x)}; $$

$${\cal H}(\mu | \mathbf{r}) = - \int d^N\mathbf{r}{\kern1.5pt} \; P(\mathbf{r}) \; \displaystyle\sum_{\mu=1}^M \; Q(\mu | \mathbf{r}) \ln Q(\mu | \mathbf{r}) $$

in terms of the { θ _i, i = 1,...,N + 1}, and to see what is the optimal choice of the { θ _i, i = 1,...,N + 1}, or equivalently of the {x _i, i = 1,...,N}, for having this difference Δ as small as possible for a large but finite value of N.

If we set:

$$ \widetilde{P}_i \equiv \int_{\theta_{i}}^{\theta_{i+1}} dx {\kern1.5pt} \: p(x) $$

(62)

$$\widetilde{P}_i(\mu) \equiv \int_{\theta_{i}}^{\theta_{i+1}} dx {\kern1.5pt} \: p(x |\mu) $$

(63)

$$\widetilde{Q}_{i,\mu} \equiv P(\mu | r_i=1) = \frac{\widetilde{P}_i(\mu) \, q_{\mu}}{\widetilde{ P}_i} $$

(64)

$$Q_{\mu}(x) \equiv P(\mu | x) $$

(65)

then we can write Δ from Eq. (61) as :

$$ \Delta = \sum_{i} \, \Delta_i $$

(66)

with

$$ \Delta_i = \int_{\theta_i}^{\theta_{i+1}} dx {\kern3pt} p(x) {\kern3pt} \Big[ \mathcal{H}(\{ \widetilde{Q}_{i,\mu}\}) - \mathcal{H}(\{ Q_{\mu}(x)\}) \Big] $$

(67)

where $\mathcal{H}(\{Q_{\mu}\}_{\mu=1}^M )$ is the mixing entropy:

$$ \mathcal{H}\left(\{Q_{\mu}\}_{\mu=1}^M\right) \; = \; - \; \displaystyle\sum_{\mu=1}^M \, Q_{\mu} \ln Q_{\mu} $$

(68)

The quantity $\mathcal{H}(\{ Q_{\mu}(x)\})$ is zero on a homogeneous domain, hence there is no contribution from intervals included in such domain. Suppose on the contrary that on the full range under consideration $\mathcal{H}(\{ Q_{\mu}(x)\})$ is rapidly varying. We expect then the optimal θ _i’s distribution to be dense, that is a _i = θ _i + 1 − θ _i small.

Recalling that x _i = (θ _i + 1 + θ _i)/2 and assuming a _i ≪ 1,

$$ \begin{array}{*{20}c} \widetilde{P}_i &=& \int_{-a_i/2}^{a_i/2} dz \left[p(x_i) + z p'(x_i) + \frac{z^2}{2} p''(x_i) \right] \\ &=& a_i \; p(x_i) + \frac{ a_i^3}{24} \; p''(x_i). \end{array}$$

Likewise, $ \widetilde{P}_i (\mu) = a_i \; p(x_i|\mu) + a_i^3/24 \; p''(x_i|\mu) $. Thus,

$$\widetilde{Q}_{i,\mu} = Q_{\mu}(x_i) + \frac{a_i^2}{24} A_{i,\mu} $$

(69)

with

$$ A_{i,\mu} = Q_{\mu}''(x_i) + 2 \frac{p'(x_i)}{p(x_i)} \; Q_{\mu}'(x_i) $$

(70)

A Taylor expansion then gives:

$$ \begin{array}{*{20}c} &&{\kern-6pt} \mathcal{H}\big({\kern-.2pt}\{\widetilde{Q}_{i,\mu}\}{\kern-.2pt}\big){\kern-3pt} ={\kern-3pt} \mathcal{H}\big({\kern-.2pt} \{Q_{\mu}(x_i)\}{\kern-.2pt}\big) {\kern-3pt}+{\kern-3pt} \nabla \mathcal{H}\big({\kern-.2pt}\{Q_{\mu}(x_i)\}{\kern-.2pt}\big){\kern-2pt} \cdot{\kern-4pt} \left( {\kern-4pt}\cdots{\kern-2pt} \frac{a_i^2 A_{i,\mu}}{24}{\kern-1pt} \cdots{\kern-4pt} \right)^{{\kern-2pt}\top} \\ &&+ \frac{1}{2} \left( \; \cdots {\kern3pt} \frac{a_i^2 A_{i,\mu}}{24} \; \cdots \right) \mathbb{H}_{\mathcal{H}}\big( \{Q_{\mu}(x_i)\} \big) \left( \; \cdots {\kern3pt} \frac{a_i^2 A_{i,\mu}}{24} \; \cdots \right)^{\top} \end{array}$$

(71)

where $\nabla \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big)$ and ℍare respectively the gradient and the hessian of $\mathcal{H}$ evaluated in {Q _μ(x _i)} (see (53) and (43). A second order expansion leads to:

$$ \mathcal{H}\big(\{\widetilde{Q}_{i,\mu}\}\big) = \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) - \frac{a_i^2}{24} \sum_{\mu=1}^M A_{i,\mu} (\ln Q_{\mu}(x_i) + 1) $$

(72)

Thus we have:

$$ \begin{array}{*{20}l} &&{\kern-8pt}\int_{\theta_i}^{\theta_{i+1}} dx {\kern1.5pt} p(x) {\kern3pt} \mathcal{H}\big(\{\widetilde{Q}_{i,\mu}\}\big)\\ &&{\kern.6pc} = a_i p(x_i) \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) + \frac{a_i^3}{24} \left(p''(x_i) \vphantom{\sum_{\mu=1}^M A_{i,\mu}} \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) \right.\\ &&{\kern1.5pc} \left.- p(x_i) \sum_{\mu=1}^M A_{i,\mu} (\ln Q_{\mu}(x_i) + 1) \right) \end{array}$$

(73)

Moreover,

$$ \begin{array}{*{20}l} &&{\kern-8pt}\int_{\theta_i}^{\theta_{i+1}} dx {\kern1.5pt} p(x) {\kern3pt} \mathcal{H}\big(\{Q_{\mu}(x)\}\big)\\ &&{\kern.6pc} = a_i p(x_i)\mathcal{H}\big(\{Q_{\mu}(x_i)\}\big) + \frac{a_i^3}{24} \frac{\partial^2 \; p(x) \, \mathcal{H}\big(\{Q_{\mu}(x)\}\big)}{\partial x^2} \Big|_{x_i} \\ \end{array}$$

(74)

with

$$ \begin{array}{*{20}l} &&{\kern-7.5pt} \frac{\partial^2 \; p(x) \mathcal{H}\big(\{Q_{\mu}(x)\}\big)}{\partial x^2} \Big|_{x_i}\\ &&{\kern.2pc} = - 2 p'(x_i) \sum_{\mu=1}^M Q_{\mu}'(x_i) (\ln Q_{\mu}(x_i) + 1) \\ &&{\kern1.15pc} - p(x_i) \left(\sum_{\mu=1}^M \left\{{\kern-1.5pt} Q_{\mu}''(x_i) (\ln Q_{\mu}(x_i) + 1) + \frac{Q_{\mu}'(x_i)^{2}}{Q_{\mu}(x_i)} {\kern-2pt} \right\}{\kern-3pt} \right) \\ &&{\kern1.15pc} - p''(x_i) \sum_{\mu=1}^M Q_{\mu}(x_i) \ln Q_{\mu}(x_i) \end{array}$$

(75)

Putting Eqs. (73), (74) and (75) together eventually leads to :

$$ \Delta_i = \frac{a_i^3}{24} \;p(x_i) \sum_{\mu=1}^{M} {\kern3pt}\frac{P'(\mu |x_i)^2}{P(\mu |x_i)} $$

(76)

hence

$$ \Delta = \sum_i \frac{a_i^3}{24} \;p(x_i) \; F_{\text{cat}}(x_i) $$

(77)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonnasse-Gahot, L., Nadal, JP. Neural coding of categories: information efficiency and optimal population codes. J Comput Neurosci 25, 169–187 (2008). https://doi.org/10.1007/s10827-007-0071-5

Download citation

Received: 11 May 2007
Revised: 06 October 2007
Accepted: 04 December 2007
Published: 31 January 2008
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10827-007-0071-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural coding of categories: information efficiency and optimal population codes

Abstract

Access this article

Similar content being viewed by others

High-dimensional geometry of population responses in visual cortex

Pseudosparse neural coding in the visual system of primates

The impact of training methodology and category structure on the formation of new categories from existing knowledge

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Derivation of Eq. (23)

Appendix B: Derivation of Eq. (28)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neural coding of categories: information efficiency and optimal population codes

Abstract

Access this article

Similar content being viewed by others

High-dimensional geometry of population responses in visual cortex

Pseudosparse neural coding in the visual system of primates

The impact of training methodology and category structure on the formation of new categories from existing knowledge

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Derivation of Eq. (23)

Appendix B: Derivation of Eq. (28)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation