Abstract
Interactions among neurons are a key component of neural signal processing. Rich neural data sets potentially containing evidence of interactions can now be collected readily in the laboratory, but existing analysis methods are often not sufficiently sensitive and specific to reveal these interactions. Generalized linear models offer a platform for analyzing multi-electrode recordings of neuronal spike train data. Here we suggest an L 1-regularized logistic regression model (L 1 L method) to detect short-term (order of 3 ms) neuronal interactions. We estimate the parameters in this model using a coordinate descent algorithm, and determine the optimal tuning parameter using a Bayesian Information Criterion. Simulation studies show that in general the L 1 L method has better sensitivities and specificities than those of the traditional shuffle-corrected cross-correlogram (covariogram) method. The L 1 L method is able to detect excitatory interactions with both high sensitivity and specificity with reasonably large recordings, even when the magnitude of the interactions is small; similar results hold for inhibition given sufficiently high baseline firing rates. Our study also suggests that the false positives can be further removed by thresholding, because their magnitudes are typically smaller than true interactions. Simulations also show that the L 1 L method is somewhat robust to partially observed networks. We apply the method to multi-electrode recordings collected in the monkey dorsal premotor cortex (PMd) while the animal prepares to make reaching arm movements. The results show that some neurons interact differently depending on task conditions. The stronger interactions detected with our L 1 L method were also visible using the covariogram method.










Similar content being viewed by others
References
Aertsen, A. M. H. J., Gerstein, G. L., Habib, M. K., & Palm, G. (1989). Dynamics of neuronal firing correlation: Modulation of ‘effective connectivity’. Journal of Neurophysiology, 61, 900–917.
Avalos, M., Grandvalet, Y., & Ambroise C. (2003). Regularization methods for additive models. In Advances in intelligent data analysis V.
Batista, A. P., Santhanam, G., Yu, B. M., Ryu, S. I., Afshar, A., & Shenoy, K. V. (2007). Reference frames for reach planning in macaque dorsal premotor cortex. Journal of Neurophysiology, 98, 966–983.
Brillinger, D. R. (1988). Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics, 59, 189–200.
Brody, C. D. (1999). Correlations without synchrony. Neural Computation, 11, 1537–1551.
Brown, E. N., Kass, R. E., & Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience, 7(5), 456–461.
Chen, Z., Putrino, D. F., Ghosh, S., Barbieri, R., & Brown, E. N. (2010). Statistical inference for assessing functional connectivity of neuronal ensembles with sparse spiking data. In IEEE transactions on neural systems and rehabilitation engineering.
Chestek, C. A., Batista, A. P., Santhanam, G., Yu, B. M., Afshar, A., Cunningham, J. P., et al. (2007). Single-neuron stability during repeated reaching in macaque premoter cortex. Journal of Neuroscience, 27(40), 10742–10750.
Czanner, G., Grun, S., & Iyengar, S. (2005). Theory of the snowflake plot and its relations to higher-order analysis methods. Neural Computation, 17, 1456–1479.
Ecker, A. S., Berens, P., Keliris, G. A., Bethge, M., Logothetis, N. K., & Tolias, A. S. (2010). Decorrelated neuronal firing in cortical microcircuits. Science, 327(5965), 584–587.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.
Eldawlatly, S., Jin, R., & Oweiss, K. G. (2009). Identifying functional connectivity in large-scale neural ensemble recordings: A multiscale data mining approach. Neural Computation, 21, 450–477.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Fujisawa, S., Amarasingham, A., Harrison, M. T., & Buzsaki, G. (2008). Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nature Neuroscience, 11(7), 823–833.
Gao, Y., Black, M. J., Bienenstock, E., Wei, W., & Donoghue, J. P. (2003). A quantitative comparison of linear and non-linear models of motor cortical activity for the encoding and decoding of arm motions. In First intl. IEEE/EMBS conf. on neural eng. (pp. 189–192).
Gerstein, G. L., & Perkel, D. H. (1972). Mutual temporal relationships among neuronal spike trains: Statistical techniques for display and analysis. Biophysical Journal, 12, 453–473.
Harrison, M. T., & Geman, S. (2009). A rate and history-preserving resampling algorithm for neural spike trains. Neural Computation, 21, 1244–1258.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer-Verlag.
Kass, R. E., Kelly, R. C., & Loh, W. (2011). Assessment of synchrony in multiple neural spike trains using loglinear point process models. Annals of Applied Statistics, 5(2B), 1262–1292. (Special Section on Statistics and Neuroscience)
Kelly, R. C., Smith, M. A., Kass, R. E., & Lee, T. S. (2010). Accounting for network effects in neuronal responses using L1 regularized point process models. In Advances in Neural Information Processing Systems (Vol. 23, pp. 1099–1107).
Kohn, A., & Smith, M. A. (2005). Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. Journal of Neuroscience, 25(14), 3661–3673.
Kulkarni, J. E., & Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18(5), 375–407.
Matsumura, M., Chen, D., Sawaguchi, T., Kubota, K., & Fetz, E. E. (1996). Synaptic interactions between primate precentral cortex neurons revealed by spike-triggered averaging of intracellular membrane potentials in vivo. Journal of Neuroscience, 16(23), 7757–7767.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.
Meinshausen, N., & Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimentional data. Annals of Statistics, 37(1), 246–270.
Mishchencko, Y., Vogelstein, J. T., & Paninski, L. (2011). Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. Annals of Applied Statistics, 5(2B), 1229–1261. (Special Section on Statistics and Neuroscience)
Moran, D. W., & Schwartz, A. B. (1999). Motor cortical representation of speed and direction during reaching. Journal of Neurophysiology, 82, 2676–2692.
Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.
Park, M. Y., & Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society, Series B, 69(4), 659–677.
Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.
Perkel, D. H., Gerstein, G. L., & Moore, G. P. (1967). Neuronal spike trains and stochastic point process ii. Simultaneous spike trains. Biophysical Journal, 7, 414–440.
Perkel, D. H., Gerstein, G. L., Smith, M. S., & Tatton, W. G. (1975). Nerve-impulse patterns: A quantitative display technique for three neurons. Brain Research, 100, 271–296.
Qian, G., & Wu, Y. (2006). Strong limit theorems on the model selection in generalized linear regression with binomial responses. Statistica Sinica, 16, 1335–1365.
Reid, C. R., & Alonso, J. (1995). Specificty of monosynaptic connections from thalamus to visual cortex. Nature, 378(16), 281–284.
Rosset, S. (2004). Following curved regularized optimization solution paths. Advances in NIPS.
Santhanam, G., Sahani, M., Ryu, S., & Shenoy, K. (2004). An extensible infrastructure for fully automated spike sorting during online experiments. In Conf. proc. IEEE eng. med. biol. soc. (Vol. 6, pp. 4380–4384).
Stevenson, I. H., Rebesco, J. M., Hatsopoulos, N. G., Haga, Z., Miller, L. E., & Kording, K. P. (2009). Bayesian inference of functional connectivity and network structure from spikes. IEEE TNSRE (Special Issue on Brain Connectivity), 17(3), 203–213.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
Tibshirani, R. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385–395.
Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089.
Truccolo, W., Hochberg, L. R., & Donoghue, J. P. (2010). Collective dynamics in human and monkey sensorimotor cortex: Predicting single neuron spikes. Nature Neuroscience, 13(1), 105–111.
Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society, Series B, 71(3), 671–683.
Wasserman, L., & Roeder, K. (2009). High-dimensional variable selection. Annals of Statistics, 37, 2178–2201.
Wu, T., & Lange, K. (2008). Pathwise coordinate optimization. Annals of Applied Statistics, 2(1), 224–244.
Zhao, M., & Iyengar, S. (2010). Nonconvergence in logistic and poisson models for neural spiking. Neural Computation, 22, 1231–1244.
Zohary, E., Shadlen, N. M., & Newsome, W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 370, 140–143.
Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Annals of Statistics, 35(5), 2173–2192.
Acknowledgements
We thank Trevor Hastie and Erin Crowder for their advice during the early stages of this work. We thank Ashwin Iyengar for help with the scalable vector figures. The simulations were done using PITTGRID. We also thank the Action Editor and reviewers for their thoughtful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Action Editor: Rob Kass
Appendix
Appendix
The proof quotes two lemmas and theorems in Qian and Wu (2006), one theorem in Fan and Li (2001) and one theorem in Park and Hastie (2007). To make them hold, we inherit the conditions (C.1)–(C.14) in Qian and Wu (2006) and conditions (A)–(C) in Fan and Li (2001). We refer the reader to those papers for the details. Without elaborating those conditions, we paraphrase the quoted lemmas and theorems as the lemmas for our proof. Intuitively, the conditions (C.1)–(C.6) are requirements for link functions in general, which logit link will not violate (Qian and Wu 2006). The conditions (C.7)–(C.13) are requirements for covariates, where no observation should dominate as the sample size tends to infinity. The conditions (C.14) and (A)–(C) are requirements for log-likelihood functions, where classic likelihood theory can apply.
We denote \(\boldsymbol\beta_0\) as the true values of a collection of P parameters, of which only p are nonzero. Here we assume both p and P finite and not varying with sample size n. Denote the log-likelihood function for logistic regression as l. \(\mathcal{C}\) and \(\mathcal{W}\) are sets of all correct models and all wrong models respectively. \(\hat{\boldsymbol\beta}_c\) stands for the unregularized MLEs under the assumption of model \(c\in\mathcal{C}\), and \(\hat{\boldsymbol\beta}_w\) stands for the unregularized MLEs under the assumption of model \(w\in\mathcal{W}\). \(\hat{\boldsymbol\beta}(\gamma)\) stands for the L 1-regularized estimates at γ. If there is a subscript c or w under \(\hat{\boldsymbol\beta}(\gamma)\), it means that the nonzero estimates in \(\hat{\boldsymbol\beta}(\gamma)\) consist of model c or w.
Lemma 1
(Theorem 2 in Qian and Wu 2006) Under (C.1)–(C.14), for any correct model \(c\in\mathcal{C}\)
Lemma 2
(Theorem 3 in Qian and Wu 2006) Under (C.1)–(C.14), for any wrong model \(w\in\mathcal{W}\)
Lemma 3
(Theorem 1 in Fan and Li 2001) Under (A)–(C), there exists a local maximizer \(\hat{\boldsymbol\beta}(\gamma)\) for L 1 -regularized log-likelihood such that \(\parallel\hat{\boldsymbol\beta}(\gamma)-\boldsymbol\beta_0\parallel=O_p(n^{-1/2}+\gamma/n)\) .
Lemma 4
(Lemma 4 in Qian and Wu 2006) Under (C.1)–(C.14), we have each component of \(\frac{\partial l}{\partial\boldsymbol\beta}(\boldsymbol\beta_0)\) equal to \(O(\sqrt{n\log\log n})\) a.s..
Lemma 5
(Lemma 6 in Qian and Wu 2006) Under (C.1)–(C.14), there exists two positive numbers d 1 and d 2 such that the eigenvalues of \(-\partial^2l/\partial\boldsymbol\beta\partial\boldsymbol\beta'\) at \(\boldsymbol\beta_0\) are bounded by d 1 n and d 2 n a.s. as n goes to infinity.
Lemma 6
(Lemma 1 in Park and Hastie 2007) If the intercept in the logistic model are not regularized, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid\) ,j = 1, ..., P, the intercept is the only non-zero coefficient.
Proof of the Theorem
Let γ 1 > γ 2. Denote m 1 as the model consisting of d 1 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_1)\), and m 2 as the model consisting of d 2 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_2)\). Therefore,
If \(m_1\in\mathcal{C}\) and \(m_2\in\mathcal{C}\), by Lemma 1, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(\log\log n)>0\). By the definition of maximum likelihood, we also have \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})<0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))=o(\log n)\), BIC(γ 1) − BIC(γ 2) < 0 and the correct model m 1 with smaller number of parameters is selected.
If \(m_1\in\mathcal{W}\) and \(m_2\in\mathcal{C}\), by Lemma 2, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(n)>0\). Again by the definition of maximum likelihood, we have \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))>0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})=o(n)\), BIC(γ 1) − BIC(γ 2) > 0 and the correct model m 2 is selected.
Thus, it is required to show that, for any \(c\in\mathcal{C}\), we have \(l(\hat{\boldsymbol\beta}_{c})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). Because \(l(\hat{\boldsymbol\beta}_{c})-l(\boldsymbol\beta_{0})=O(\log\log n)\), it suffices to show \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). By a Taylor expansion, we have
So by Lemmas 3, 4 and 5, we have
When \(\gamma=o(\sqrt{n\log n})\), we have \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\).
In the end, because Lemma 6 says that, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid=O(\sqrt{n\log\log n})\), it gives a null model with only an intercept, we do not need a tuning parameter γ exceeding \(o(\sqrt{n\log n})\). Therefore, \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\) is achievable for all correct models given by \(\hat{\boldsymbol\beta}(\gamma)\). Therefore, the BIC γ-selector selects the correct model with smallest number of parameters among all the submodels \(\hat{\boldsymbol\beta}(\gamma)\) presents.
Rights and permissions
About this article
Cite this article
Zhao, M., Batista, A., Cunningham, J.P. et al. An L 1-regularized logistic model for detecting short-term neuronal interactions. J Comput Neurosci 32, 479–497 (2012). https://doi.org/10.1007/s10827-011-0365-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10827-011-0365-5