Abstract
Many methods used to analyze neuronal response assume that neuronal activity has a fundamentally linear relationship to the stimulus. However, some neurons are strongly sensitive to multiple directions in stimulus space and have a highly nonlinear response. It can be difficult to find optimal stimuli for these neurons. We demonstrate how successive linear approximations of neuronal response can effectively carry out gradient ascent and move through stimulus space towards local maxima of the response. We demonstrate search results for a simple model neuron and two models of a highly selective neuron.











Similar content being viewed by others
Notes
The parameters were a 1 = a 2 = c 1 = c 2 = d 1 = d 4 = 0.5, a 3 = a 4 = d 2 = d 3 = − 0.5, b 1 = b 2 = e 1 = e 4 = 4 b 3 = b 4 = e 2 = e 3 = − 4, c 3 = c 4 = 0.2, a 5 = − 1, b 5 = d 5 = e 5 = 0, c 5 = 0.1.
Since changing the noise magnitude changed \(F_{\text{ave}}\) with each iteration, we couldn’t compare the line search values \(F_{\text{ave}}(\mathbf{x}_s)\) with the \(F_{\text{ave}}(\hat{\mathbf{x}}_{i-i})\) from the previous iteration. Instead, we used \(F_{\text{ave}}(\breve{\mathbf{x}}_{i-i})\) computed at the beginning of each line search to test for the success of that iteration.
References
Adrian, E. D. (1928). The basis of sensation. New York: W. W. Norton.
Anderson, M. J., & Micheli-Tzanakou, E. (2002). Auditory stimulus optimization wtih feedback from fuzzy clustering of neuronal responses. IEEE Transactions on Information Technology in Biomedicine, 6(2), 159–170.
Bazaraa, H. D., Mokhtar, S., Sherali, & Shetty, C. M. (2006). Nonlinear programming: Theory and algorithms (3rd ed.). Wiley-Interscience.
Benda, J., Gollisch, T., Machens, C., & Herz, A. V. (2007). From response to stimulus: Adaptive sampling in sensory physiology. Current Opinion in Neurobiology, 17, 430–436.
Bleeck, S., Patterson, R., & Winter, I. (2003). Using genetic algorithms to find the most effective stimulus for sensory neurons. Journal of Neuroscience Methods, 125, 73–82.
Chichilnisky, E. J. (2001). A simple white noise analysis of neural light responses. Network-Computation in Neural Systems, 12, 199–213.
de Ruyter van Steveninck, R., & Bialek, W. (1988). Real-time performance of a movement-sensitive neuron in the blowfly visual system: Coding and information transmission in short spike sequences. Proceedings of the Royal Society of London Series B-Biological Sciences, 234, 379–414.
DeBoer, E., & Kuyper, P. (1968). Triggered correlation. IEEE Transactions on Biomedical Engineering, 15, 169–179.
Desimone, R. (1991). Face-selective cells in the temporal cortex of monkeys. Journal of Cognitive Neuroscience, 3, 1–8.
DiMattina, C., & Zhang, K. (2008). How optimal stimuli for sensory neurons are constrained by network architecture. Neural Computation, 20, 668–708.
Doupe, A. J., & Konishi, M. (1991). Song-selective auditory circuits in the vocal control system of the zebra finch. Proceedings of the National Academy of Sciences of the United States of America, 88, 11339–11343.
Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia: Society for Industrial and Applied Mathematics.
Eggermont, J. J., Johannesma, P. I. M., & Aertsen, A. M. H. J. (1983). Reverse-correlation methods in auditory research. Quarterly Reviews of Biophysics, 16, 341–414.
Foldiak, P. (2001). Stimulus optimization in primary visual cortex. Neurocomputing, 38–40, 1217–1222.
Harth, A., & Tazanakou, E. (1974). Alopex: A stochastic method for determining visual receptive fields. Vision Research, 14, 1475–1482.
Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181–198.
Janata, P., & Margoliash, D. (1999). Gradual emergence of song selectivity in sensorimotor structures of the male zebra finch song system. Journal of Neuroscience, 19, 5108–5118.
Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58, 1233–1258.
Kadia, S. C., & Wang, X. (2003). Spectral integration in the primary auditory cortex of awake primates: Neurons with single-peaked and multi-peaked tuning curves. Journal of Neurophysiology, 89, 1603–1622.
Klein, D. J., Depireux, D. A., Simon, J. Z., & Shamma, S. A. (2000). Robust spectrotemporal reverse correlation for the auditory system: Optimizing stimulus design. Journal of Computational Neuroscience, 9, 85–111.
Koelling, M. E., & Nykamp, D. Q. (2008). Computing linear approximations to nonlinear neuronal response. Network-Computation in Neural Systems, 19, 286–313.
Lewi, J., Butera, R., & Paninski, L. (2009). Sequential optimal design of neurophysiology experiments. Neural Computation, 21(3), 619–687.
Lewi, J., Schneider, D. M., Woolley, S. M. N., & Paninski, L. (2011). Automating the design of informative sequences of sensory stimuli. Journal of Computational Neuroscience, 30, 181–200.
Lewicki, M. S. (1996). Intracellular characterization of song-specific neurons in the zebra finch auditory forebrain. Journal of Neuroscience, 16, 5854–5863.
Ma, X., & Suga, N. (2001). Plasticity of bat’s auditory system evoked by focal electric stimulation of auditory and/or somatosensory cortices. Journal of Neurophysiology, 85, 1078–1087.
Machens, C. K. (2002). Adaptive sampling by information maximization. Physical Review Letters, 88(22), 228104.
Machens, C. K., Gollisch, T., Kolesnikova, O., & Herz, A. V. (2005). Testing the efficiency of sensory coding with optimal stimulus ensembles. Neuron, 47(3), 447–456.
Margoliash, D. (1983). Acoustic parameters underlying the responses of song-specific neurons in the white-crowned sparrow. Journal of Neuroscience, 3, 1039–1057.
Margoliash, D. (1997). Functional organization of forebrain pathways for song production and perception. Journal of Neurobiology, 33, 671–693.
Margoliash, D., & Fortune, E. S. (1992). Temporal and harmonic combination-sensitive neurons in the zebra finch’s HVc. J. Neurosci., 12, 4309–4326.
Marmarelis, P. N., & Marmarelis, V. Z. (1978). Analysis of physiological systems: The white noise approach. New York: Plenum Press.
Mooney, R. (2000). Different subthreshold mechanisms underlie song selectivity in identified hvc neurons of the zebra finch. Journal of Neuroscience, 20, 5420–5436.
Nelken, I., Prut, Y., Vaadia, E., & Abeles, M. (1994). In search of the best stimulus: An optimization procedure for finding efficient stimuli in the cat auditory cortex. Hearing Research, 72(1–2), 237–253.
Ingham, N. J., & McAlpine, D. (2004). Spike-frequency adaptation in the inferior colliculus. Journal of Neurophysiology, 91(2), 632–645.
O’Connor, K. N., Petkov, C. I., & Sutter, C. I. (2005). Adaptive stimulus optimization for auditory cortical neurons. Journal of Neurophysiology, 94, 4051–4067.
Paninski, L. (2003). Convergence properties of three spike-triggered analysis techniques. Network-Computation in Neural Systems, 14, 437–464.
Reid, R. C., & Alonso, J. M. (1995). Specificity of monosynaptic connections from thalamus to visual cortex. Nature, 378, 281–284.
Ringach, D. L., Hawken, M. J., & Shapley, R. (2002). Receptive field structure of neurons in monkey primary visual cortex revealed by stimulation with natural image sequences. Journal of Vision, 2, 12–24.
Rust, N. C., Schwartz, O., Movshon, J. A., & Simoncelli, E. P. (2005). Spatiotemporal elements of macaque v1 receptive fields. Neuron, 46(6), 945–956.
Sakai, H. M., & Naka, K. (1987). Signal transmission in the catfish retina: V. sensitivity and circuit. Journal of Neurophysiology, 58, 1329–1350.
Schwartz, O., Pillow, J. W., Rust, N. C., & Simoncelli, E. P. (2006). Spike-triggered neural characterization. Journal of Vision, 6, 484–507.
Schwartz, O., & Simoncelli, E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4, 819–825.
Shapley, R., & Enroth-Cugell, C. (1984). Visual adaptation and retinal gain controls. In N. Osborne & G. Chader (Eds.), Progress in retinal research (Vol. 3, pp. 263–346). London: Pergamon.
Sharpee, T., Rust, N. C., & Bialek, W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation, 16(2), 223–250.
Simoncelli, E. P., Paninski, L., Pillow, J., & Schwartz, O. (2004). Characterization of neural responses with stochastic stimuli. In M. Gazzaniga (Ed.), The cognitive neurosciences (3rd ed., pp. 327–338). Cambridge, MA: MIT Press.
Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9, 1135–1151.
Theunissen, F. E., David, S. V., Singh, N. C., Hsu, A., Vinje, W. E., & Gallant, J. L. (2001). Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network-Computation in Neural Systems, 12, 289–316.
Theunissen, F. E., Sen, K., & Doupe, A. J. (2000). Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. Journal of Neuroscience, 20, 2315–2233.
Xiao, Z., & Suga, N. (2004). Reorganization of the auditory cortex specialized for echo-delay processing in the mustached bat. Proceedings of the National Academy of Sciences of the United States of America, 101, 1769–1774.
Yamane, Y., Carlson, E. T., Bowman, K. C., Wnag, Z., & Connor, C. E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11(11), 1352–1360.
Yan, J., & Suga, N. (1996). Corticofugal modulation of time-domain processing of biosonar information in bats. Science, 273, 1100–1103.
Acknowledgements
We thank Teresa Nick for inspiring this line of research, Greg Horwitz for helpful discussions, and the anonymous reviewers for their suggestions. This research was supported by the National Science Foundation grant DMS-0719724 (DQN).
Author information
Authors and Affiliations
Corresponding author
Additional information
Action Editor: Liam Paninski
Appendices
Appendix A: Linear kernel is gradient of smoothed function
This appendix provides a proof of the following: if the noise Z is Gaussian noise, then the expected value of the linear kernel \(\mathbf{h}(\hat{\mathbf{x}})\) will point in the direction of the gradient of \(F_{\text{ave}}(\mathbf{X})=E(F(\mathbf{X}+\mathbf{Z}))\), the smoothed version of F(X). As a consequence, the expected value of the linear kernel will point in the direction in which the function \(F_{\text{ave}}\) increases most rapidly.
If one adds zero mean noise Z with covariance matrix C Z to the original stimulus \(\hat{\mathbf{x}}\) and presents the combined stimulus \(\mathbf{X}=\hat{\mathbf{x}}+\mathbf{Z}\) to a neuron with spiking probability given by Eq. (1), the expected value of the linear kernel (Eq. (4)) would be
where the μ R drops out become Z is mean zero. Because \(E(R|\mathbf{Z})=F(\hat{\mathbf{x}}+\mathbf{Z})\), we can rewrite this as
where ρ Z is the probability distribution of the noise.
If the noise is Gaussian, then its probability density function is that of the multivariate normal distribution
We will use this to rewrite the integrand in Eq. (9). Differentiating ρ(Z) by the ith component of Z, we calculate
where the subscript i indicates the ith component of the vector in parentheses, and we used the fact that the covariance matrix is symmetric. Combining this for all components, we find that
so that Eq. (9) becomes
Integrate each component by parts and transfer the derivative onto F. The boundary terms disappear since ρ z (z) goes to zero for large values of z. We end up transferring the gradient onto F and find that the expected value of the linear kernel is the expected value of the gradient of F, averaged over the noise,
This result is related to Stein’s Lemma (Stein 1981).
If we define \(F_{\text{ave}}(\mathbf{X})\) to be a version of F(X) that is smoothed by the noise,
then \(F_{\text{ave}}(\hat{\mathbf{z}}) = E(F(\hat{\mathbf{x}}+\mathbf{Z})) = E(R) = E(\mu_R(\hat{\mathbf{X}}))\) and \(\nabla F_{\text{ave}}(\hat{\mathbf{x}}) = E(\nabla F(\hat{\mathbf{x}}+\mathbf{Z}))= E(\mathbf{h}(\hat{\mathbf{x}}))\). We see that the expected value of the linear kernel is the gradient of \(F_{\text{ave}}\) and that the expected value of the linear approximation (3) of F is the differential linear approximation (2) of \(F_{\text{ave}}\):
In particular, this means the expected value of the linear kernel \(\mathbf{h}(\hat{\mathbf{x}})\) defined by Eq. (4) will point in the direction in which the smoothed version (Eq. (12)) of F increases most rapidly.
Appendix B: Calculating noise-averaged response along a line
For the line search algorithm, the goal is to locate along the line \(\mathbf{x}_s = \hat{\mathbf{x}}_i + s \mathbf{h}(\hat{\mathbf{x}}_i)\) specified by the linear kernel the maximum of
where ρ z (z) is the probability density of a single realization of the noise Z. This search can require many data points. In this appendix, we explain how to reduce the number of data points needed.
The simplest idea would be to estimate \(F_{\text{ave}}(\mathbf{x}_s)\) at points along the line by generating multiple realizations of the noise Z and taking the average number of spikes elicited by the stimulus x s + Z. In principle, this method would work, but it would require many realizations of the noise for each point, so the line search would be very expensive in terms of the amount of data that would need to be collected from a neuron.
Another way to estimate the integral for \(F_{\text{ave}}(\mathbf{x}_s)\), at least in principle, would be to sample Z uniformly over a bounded subset of R n (that contained most of the mass of ρ z (z)), and then take the average of the number of spikes elicited by X + Z, weighted by ρ z (Z). This method would be a grossly inefficient method to estimate the whole integral, as one would waste effort collecting data from unlikely Z and multiplying the result by a very small ρ z (Z).
We demonstrate how a combination of the two approaches can be used to efficiently estimate a series of values of \(F_{\text{ave}}(\mathbf{x}_s)\) along the line \(\mathbf{x}_s= \hat{\mathbf{x}}_i + s \mathbf{h}(\hat{\mathbf{x}}_i)\). The first step is to separate integral (13) into two parts: the one dimensional integral in the direction of h and the n − 1 dimensional integral in the orthogonal complement. Let z h = z · h be the magnitude of z in the direction of the linear kernel and let z ⊥ = z − z h h denote the perpendicular components of z. Write the probability distribution ρ z (z) = ρ h (z h ) ρ ⊥ (z ⊥ | z h ), where ρ h (z h ) is the marginal distribution of z h and ρ ⊥ (Z ⊥ | z h ) is the distribution of z ⊥ conditioned on a value of z h . The integral (13) evaluated at \(\mathbf{X}=\mathbf{x}_s=\hat{\mathbf{x}}_i + s \mathbf{h}\) can hence be rewritten as
Next, approximate the conditional distribution ρ ⊥ (z ⊥ | z h ) as though it did not depend on z h , replacing it with the marginal distribution ρ ⊥ (z ⊥ ). This approximation will allow us to rewrite the n − 1 dimensional integral so that it does not explicitly depend on s. We simply change variables to u = z h + s in the one dimensional integral, obtaining
Given a fixed \(\hat{\mathbf{x}}_i\) and h, the n − 1 dimensional integral is just a function of the parameter u,
and the approximation of \(F_{\text{ave}}(\mathbf{x}_s)\) is just a convolution of I(u) with the marginal density ρ h
To minimize the amount of data needed to simultaneously estimate \(F_{\text{ave}}(\mathbf{x}_s)\) for many s, we collect data to estimate I(u) using Eq. (16) rather than estimating Eq. (13) directly. We sample data along a uniformly spaced grid of u and weight the results proportional to ρ h (u − s), according to Eq. (17). This procedure decreases the efficiency for calculating a single \(F_{\text{ave}}(\mathbf{x}_s)\), as some samples will be multiplied by a small ρ h (u − s). However, since we can reuse the same data points to estimate \(F_{\text{ave}}(\mathbf{x}_s)\) at many different points x s , the procedure increases the efficiency for estimating a whole series of \(F_{\text{ave}}(\mathbf{x}_s)\).
To estimate \(F_{\text{ave}}(\mathbf{x}_s)\) via Eq. (17), we need to approximate I(u). We use the following algorithm to sample data whose expected value is I(u). We generate one independent realization Z of the noise and subtract off its projection onto h, creating Z ⊥ = Z − (Z ·h) h. Then, we present the stimulus \(\mathbf{X} = \hat{\mathbf{x}}_i+u\mathbf{h} + \mathbf{Z}_\perp\) to the neuron and record the number of spikes elicited, denoting the result by S(u). In this way, E(S(u)) = I(u). Although we would have to repeat this step many times and average S(u) to obtain a reasonable estimate of I(u), we sample only one data point per u because we aren’t interested in I(u) itself.
The estimate of \(F_{\text{ave}}(\mathbf{x}_s)\) via Eq. (17) also requires the calculation of the marginal distribution ρ h (z h ) of z h = Z ·h, which could depend on the direction h. Depending on the distribution ρ(z) of the noise, the marginal distribution could be calculated analytically or estimated via Monte Carlo simulation using many realizations of the noise Z. (Since this sampling doesn’t involve the neuron, the concerns for data efficiency don’t apply here.) However, as discussed in Section 2.2.1, using non-Gaussian noise Z for \(F_{\text{ave}}(\mathbf{X}_s)\) may result in artifacts. One might still use non-Gaussian Z if the neuron didn’t respond well to x s + Z for Gaussian Z. Even so, the calculation at this point does not involve stimulating the neuron, which means the argument of obtaining a robust neuronal response does not apply to choice of ρ h . Hence, we recommend approximating ρ h using a normal distribution with standard deviation set to the σ of the noise magnitude.
Combining these calculations, we estimate \(F_{\text{ave}}(\mathbf{x}_s)\) along the line via Eq. (17) as follows. First, for a given step size Δs, we sample I(u) at grid points kΔs, obtaining S(kΔs). Each S(kΔs) is the number of spikes elicited from the neuron from a single presentation of the stimulus \(\mathbf{X} =\hat{\mathbf{x}}_i + k\Delta s \mathbf{h} + \mathbf{Z}_\perp\), where for each k, we generate an independent realization of Z to calculate Z ⊥ = Z − (Z·h)h. Second, we numerically sample ρ h (jΔs) for integer j between − J and J, where J = ⌈2σ/Δs ⌉, normalizing the result to vector components \(\hat{\rho}_j\) that sum to 1. This estimate of \(\hat{\rho}_j\) does not involve any presentation of the stimulus to the neuron. We obtain the estimate of \(F_{\text{ave}}(\mathbf{x}_s)\), for \(\mathbf{x}_s = \hat{\mathbf{x}}_i + s\mathbf{h}\):
As the spike counts S(jΔs) appear in estimates for many T(kΔs), Eq. (18) can be used to efficiently estimate \(F_{\text{ave}}\) along the line x s .
T(s) is the weighted average of 2J + 1 samples of spike counts S(·). The differing weights mean that T(s) is not invariant under permutation of the S, making it trickier to estimate the variance in T via resampling. We estimated confidence intervals of T via an approximate jackknife algorithm (Efron 1982). We divided the samples into ten parts, where each part is composed of every tenth S(jΔs). Although the dependence of T on those ten parts was not exactly invariant under permutation, each part sampled the range of weights \(\hat{\rho}_j\). We obtained a jackknife estimate of the standard deviation δT(s) of T(s) based on calculations of T that omit one of those parts.
We calculate T(s) initially for 0 ≤ s ≤ 4σ. Then, we continue to calculate T(s) for larger s and compare the new value T(s) with those from a previous window of (s − 4σ,s). Denoting the minimum and maximum value of T in the window by m and M, respectively, we stop calculating additional T(s) if T(s) is smaller than in the window (T(s) ≤ m) or if T(s) hasn’t changed much in the window and doesn’t appear to be rising (M − m ≤ mean δT in window and T(s) < maximum T − δT in window).
When either criterion is met, we stop the line search and find \(s_{\text{max}}\), defined as the value of s that maximizes T(s) over all values calculated. If \(T(s_{\text{max}}) - \delta T(s_{\text{max}})\) is greater than our previous maximum estimate of \(F_{\text{ave}}(\mathbf{X})\), then we conclude we have found a new stimulus \(\mathbf{x}_{s_{\text{max}}} = \hat{\mathbf{x}}_i + s_{\text{max}} \mathbf{h}(\hat{\mathbf{x}}_i)\) that appears to elicit a higher firing rate in the neuron and set \(\hat{\mathbf{x}}_{i+1} = \mathbf{x}_{s_{\text{max}}}\).
Rights and permissions
About this article
Cite this article
Koelling, M.E., Nykamp, D.Q. Searching for optimal stimuli: ascending a neuron’s response function. J Comput Neurosci 33, 449–473 (2012). https://doi.org/10.1007/s10827-012-0395-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10827-012-0395-7