Abstract
This paper deals with the analytical study of coding a discrete set of categories by a large assembly of neurons. We consider population coding schemes, which can also be seen as instances of exemplar models proposed in the literature to account for phenomena in the psychophysics of categorization. We quantify the coding efficiency by the mutual information between the set of categories and the neural code, and we characterize the properties of the most efficient codes, considering different regimes corresponding essentially to different signal-to-noise ratio. One main outcome is to find that, in a high signal-to-noise ratio limit, the Fisher information at the population level should be the greatest between categories, which is achieved by having many cells with the stimulus-discriminating parts (steepest slope) of their tuning curves placed in the transition regions between categories in stimulus space. We show that these properties are in good agreement with both psychophysical data and with the neurophysiology of the inferotemporal cortex in the monkey, a cortex area known to be specifically involved in classification tasks.
Similar content being viewed by others
References
Abbott, L., & Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11, 91–101.
Abramson, A., & Lisker, L. (1970). Discriminability along the voicing continuum: Cross-language tests. In Proceedings of the sixth international congress of phonetic sciences. Prague: Academia.
Ashby, F., & Spiering, B. (2004). The neurobiology of category learning. Behavioral and Cognitive Neuroscience Reviews, 3(2), 101–113.
Averbeck, B., Latham, P., & Pouget, A. (2006). Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7, 358–366.
Blahut, R. E. (1987). Principles and practice of information theory. Boston, MA: Addison-Wesley Longman.
Brunel, N., & Nadal, J.-P. (1998). Mutual information, fisher information, and population coding. Neural Computation, 10, 1731–1757.
Butts, D. A., & Goldman, M. S. (2006). Tuning curves, neuronal variability, and sensory coding. PLoS Biology, 4(4), e92.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). New York: Wiley.
Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge: MIT Press.
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
Fisher, J., & Principe, J. (1998). A methodology for information theoretic feature extraction. In A. Stuberud (Ed.), Proceedings of the IEEE international joint conference on neural networks. Piscataway: IEEE.
Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316.
Freedman, D., Riesenhuber, M., Poggio, T., & Miller, E. (2003). A comparison of primate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 15, 5235–5246.
Georgopoulos, A., Schwartz, A., & Kettner, R. (1986). Neuronal population coding of movement direction. Science, 233, 1416–1419.
Goldstone, R. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123(2), 178–200.
Green, D., & Swets, J. (1988). Signal detection theory and psychophysics, reprint edition. Los Altos, CA: Peninsula.
Guenther, F., Husain, F., Cohen, M., & Shinn-Cunningham, B. (1999). Effects of categorization and discrimination training on auditory perceptual space. Journal of the Acoustical Society of America, 106, 2900–2912.
Han, Y., Köver, H., Insanally, M., Semerdjian, J., & Bao, S. (2007). Early experience impairs perceptual discrimination. Nature Neuroscience, 20(9), 1191–1197.
Harnad, S. (Ed.) (1987). Categorical perception: The groundwork of cognition. New York: Cambridge University Press.
Harnad, S. (2005). Cognition is categorization. In H. Cohen & C. Lefebvre (Eds.), Handbook of categorization. Amsterdam: Elsevier.
Hillenbrand, J., Getty, L., Clark, M., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(5), 3099–3111.
Hintzman, D. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93(4), 411–428.
Humphreys, G., & Forde, E. (2001). Hierarchies, similarity and interactivity in object recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain Sciences, 24, 453–509.
Hung, C., Kreiman, G., Poggio, T., & DiCarlo, J. (2005). Fast readout of object identity from macaque inferior temporal cortex. Science, 310, 863–866.
Jiang, X., Bradley, E., Rini, R., Zeffiro, T., VanMeter, J., & Riesenhuber, M. (2007). Categorization training results in shape- and category-selective human neural plasticity. Neuron, 53, 891–903.
Kang, K., Shapley, R., & Sompolinsky, H. (2004). Information tuning of populations of neurons in primary visual cortex. Journal of Neuroscience, 24(13), 3726–3735.
Kang, K., & Sompolinsky, H. (2001). Mutual information of population codes and distance measures in probability space. Physical Review Letters, 86(21), 4958–4961.
Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97, 4296–4309.
Knoblich, U., Freedman, D., & Riesenhuber, M. (2002). Categorization in it and pfc: Model and experiments. AI Memo 2002-007. Cambridge, MA: MIT AI Laboratory.
Kobatake, E., Wang, G., & Tanaka, K. (1998). Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. Journal of Neurophysiology, 80, 324–330.
Koida, K., & Komatsu, H. (2007). Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex. Nature Neuroscience, 10(1), 108–116.
Kruschke, J. (1992). Alcove: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22–44.
Kuhl, P. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception & Psychophysics, 50(2), 93–107.
Kuhl, P., & Padden, D. (1983). Enhanced discriminability at the phonetic boundaries for the place feature in macaques. Journal of the Acoustical Society of America, 73(3), 1003–1010.
Li, W., Piech, V., & Gilbert, C. (2004). Perceptual learning and top-down influences in primary visual cortex. Nature Neuroscience, 7(6), 651–658.
Liberman, A., Harris, K., Hoffman, H., & Griffith, B. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54, 358–369.
Livingston, K., Andrews, J., & Harnad, S. (1998). Categorical perception effects induced by category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 24(3), 732–753.
Logothetis, N., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5(5), 552–563.
Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.
Nadal, J.-P. (1994). Formal neural networks: From supervised to unsupervised learning. In E. Goles & S. Martinez (Eds.), Cellular automata, dynamical systems and neural networks. Mathematics and its applications (Vol. 282, pp. 147–166). Norwell: Kluwer.
Nosofsky, R. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology, 115(1), 39–57.
Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4(12), 1244–1252.
Palmeri, T., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–304.
Paradiso, M. (1988). A theory for the use of visual orientation information which exploits the columnar structure of striate cortex. Biological Cybernetics, 58, 35–49.
Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Pouget, A., Zhang, K., Deneve, S., & Latham, P. (1998). Statistically efficient estimation using population coding. Neural Computation, 10, 373–401.
Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes: Exploring the neural code. Cambridge: MIT Press.
Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204.
Schölkopf, B., Burges, C., & Smola, A. (Eds.) (1999). Advances in kernel methods—support vector learning. Cambridge: MIT Press.
Seriès, P., Latham, P., & Pouget, A. (2004). Tuning curve sharpening for orientation selectivity: Coding efficiency and the impact of correlations. Nature Neuroscience, 7(10), 1129–1135.
Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading neuronal population codes. Proceedings of the national academy of sciences of the United States of America, 90, 10749–10753.
Sigala, N. (2004). Visual categorization and the inferior temporal cortex. Behavioural Brain Research, 149, 1–7.
Sigala, N., & Logothetis, N. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.
Softky, W., & Koch, C. (1993). The highly irregular firing of cortical cells is inconsistent with temporal integration of random epsps. The Journal of Neuroscience, 12(1), 334–350.
Sompolinsky, H., Yoon, H., Kang, K., & Shamir, M. (2001). Population coding in neuronal systems with correlated noise. Physical Review E, 64(5), 051904.
Stein, R. (1967). The information capacity of nerve cells using a frequency code. Biophysical Journal, 7, 797–826.
Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400, 869–873.
Tanaka, K. (1996). Inferotemporal cortex and object vision. Annual Review of Neuroscience, 19, 109–139.
Taube, J., Muller, R., & Ranck, J. B. J. (1990). Head-direction cells recorded from the postsuiculum in freely moving rats. i. description and quantitative analysis. The Journal of Neuroscience, 10(2), 420–435.
Thomas, E., Hulle, M. V., & Vogels, R. (2001). Encoding of categories by noncategory-specific neurons in the inferior temporal cortex. Journal of Cognitive Neuroscience, 13(2), 190–200.
Tolhurst, D., Movshon, J., & Dean, A. (1983). The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research, 23, 775–785.
Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proc. 17th international conf. on machine learning (pp. 1015–1022). San Francisco, CA: Morgan Kaufmann.
Vogels, R. (1999). Categorization of complex visual images by rhesus monkeys. Part 2: Single-cells study. European Journal of Neuroscience, 11, 1239–1255.
Vogels, R., & Orban, G. (1990). How well do response changes of striate neurons signal differences in orientation: A study in the discriminating monkey. The Journal of Neuroscience, 10(11), 3543–3558.
Wilson, M., & DeBauche, B. (1981). Inferotemporal cortex and categorical perception of visual stimuli by monkeys. Neuropsychologia, 19(1), 29–41.
Yoon, H., & Sompolinsky, H. (1999). The effect of correlations on the fisher information of population codes. In M. Kearns, S. Solla, & D. Cohn (Eds.), Advances in neural information processing systems 11 (NIPS-11) (pp. 167–173). Cambridge: MIT Press.
Young, M., & Yamane, S. (1992). Sparse population coding of faces in the inferotemporal cortex. Science, 256, 1327–1330.
Acknowledgements
This work is part of a project “Acqlang” supported by the French National Research Agency (ANR-05-BLAN-0065-01). LBG acknowledges a fellowship from the Délégation Générale pour l’Armement. JPN is a Centre National de la Recherche Scientifique member. The initial motivation for this work comes from (psycho- and neuro-) computational issues in the perception of phonemes: we thank Sharon Peperkamp and Janet Pierrehumbert for introducing us to this topic and for valuable discussions. LBG is grateful to the members of the Laboratoire de Sciences Cognitives et Psycholinguistique de l’ENS, especially to Emmanuel Dupoux, for numerous and stimulating discussions. We acknowledge useful inputs from the referees, and most especially, we thank one of them for a detailed list of constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Action Editor: Jonathan D. Victor
Appendices
Appendix A: Derivation of Eq. (23)
The goal of this appendix and the following one is to derive Eqs. (18), (23) and (28). Remark: given the simplicity and the underlying identity of these results, we do not expect our derivations to be the simplest ones.
When N goes to ∞, we expect the mutual information I(μ, r) to converge towards I(μ, x), and we are interested in the first non trivial correction to this asymptotic limit. We thus compute for large N the difference
One can write
where
We follow the same approach as in Brunel and Nadal (1998). The first step consists in integrating over x. Taking the large N limit, we show that the leading order of the right term of Eq. (34) is zero. We then seek for the first correction using Laplace/steepest descent method. The last step eventually consists in integrating over r.
We introduce G(r|x) defined as :
and assume that it has a single global maximum at x = x m (r). We can rewrite Eq. (34) in the following way:
Integration over x. In order to integrate Eq. (34) over x, let us first show that
We begin by evaluating
with
By using saddle-point method and assuming that N ≫ 1 and K ≪ N, we find :
where ℍis the hessian matrix of ξ,
evaluated at x m (r).
Since
we have
As a result,
According to Eq. (45), \(P(\mu|\mathbf{r})-P\big(\mu|\mathbf{x}_m(\mathbf{r})\big)\) is of order 1/N, which entails, as ln (1 + z) ≈ z when z ≪ 1, that
Now, as \(\sum_{\mu=1}^M \varphi_\mu(\mathbf{x}) = 0\) we have demonstrated that \(\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0\).
We can return to Eq. (34), and apply saddle-point method knowing that \(\phi \big(\mathbf{x}_m(\mathbf{r})|\mathbf{r}\big) = 0\). This leads to :
Recalling that \(P(\mu|\mathbf{r})\!-\!P\big(\mu|\mathbf{x}_m(\mathbf{r})\big) \!\sim\! O(\frac{1}{N})\), it is straightforward to show that :
ie
where \(\nabla \xi(\mathbf{x})\big|_{\mathbf{x}_m(\mathbf{r})}\) is the column vector of the partial derivatives of ξ
evaluated at x m (r).
Putting Eqs. (44), (50) and (52) together eventually leads to:
where ‘:’ denotes the Frobenius inner product on \(\mathcal{M}_K(\mathbb{R})\), defined as follows:
Integration over r. By proceeding as Brunel and Nadal (1998, pp.1753–1754) in order to integrate over r, we get:
where \(F_{\text{code}}(\mathbf{x})\) is the K×K Fisher information matrix of the neuronal population,
Noticing that
we can introduce the K×K Fisher information matrix of the categories, \(F_{\text{cat}}(\mathbf{x})\), characterizing the sensitivity of μ with respect to small variations of x:
This eventually leads to Eq. (23):
Since \(F_{\text{code}}\) is of order N, one has in particular that I(μ,x) − I(μ,r) is of order 1/N.
Appendix B: Derivation of Eq. (28)
Each cell has an activity r i equal to 1 if x is in [θ i , θ i + 1], and 0 otherwise. The width of the ith tuning curves is thus a i = θ i + 1 − θ i , and we define the preferred stimuli as the center of the receptive fields, x i ≡ (θ i + θ i + 1)/2.
We want to compute
where
in terms of the { θ i , i = 1,...,N + 1}, and to see what is the optimal choice of the { θ i , i = 1,...,N + 1}, or equivalently of the {x i , i = 1,...,N}, for having this difference Δ as small as possible for a large but finite value of N.
If we set:
then we can write Δ from Eq. (61) as :
with
where \(\mathcal{H}(\{Q_{\mu}\}_{\mu=1}^M )\) is the mixing entropy:
The quantity \(\mathcal{H}(\{ Q_{\mu}(x)\})\) is zero on a homogeneous domain, hence there is no contribution from intervals included in such domain. Suppose on the contrary that on the full range under consideration \(\mathcal{H}(\{ Q_{\mu}(x)\})\) is rapidly varying. We expect then the optimal θ i ’s distribution to be dense, that is a i = θ i + 1 − θ i small.
Recalling that x i = (θ i + 1 + θ i )/2 and assuming a i ≪ 1,
Likewise, \( \widetilde{P}_i (\mu) = a_i \; p(x_i|\mu) + a_i^3/24 \; p''(x_i|\mu) \). Thus,
with
A Taylor expansion then gives:
where \(\nabla \mathcal{H}\big(\{Q_{\mu}(x_i)\}\big)\) and ℍare respectively the gradient and the hessian of \(\mathcal{H}\) evaluated in {Q μ (x i )} (see (53) and (43). A second order expansion leads to:
Thus we have:
Moreover,
with
Putting Eqs. (73), (74) and (75) together eventually leads to :
hence
Rights and permissions
About this article
Cite this article
Bonnasse-Gahot, L., Nadal, JP. Neural coding of categories: information efficiency and optimal population codes. J Comput Neurosci 25, 169–187 (2008). https://doi.org/10.1007/s10827-007-0071-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10827-007-0071-5