Abstract
Feature selection plays an important role in the successful application of machine learning techniques to large real-world datasets. Avoiding model overfitting, especially when the number of features far exceeds the number of observations, requires selecting informative features and/or eliminating irrelevant ones. Searching for an optimal subset of features can be computationally expensive. Functional magnetic resonance imaging (fMRI) produces datasets with such characteristics creating challenges for applying machine learning techniques to classify cognitive states based on fMRI data. In this study, we present an embedded feature selection framework that integrates sparse optimization for regularization (or sparse regularization) and classification. This optimization approach attempts to maximize training accuracy while simultaneously enforcing sparsity by penalizing the objective function for the coefficients of the features. This process allows many coefficients to become zero, which effectively eliminates their corresponding features from the classification model. To demonstrate the utility of the approach, we apply our framework to three different real-world fMRI datasets. The results show that regularized classifiers yield better classification accuracy, especially when the number of initial features is large. The results further show that sparse regularization is key to achieving scientifically-relevant generalizability and functional localization of classifier features. The approach is thus highly suited for analysis of fMRI data.
Similar content being viewed by others
References
Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1), 237–260 (1998)
Chou, C.-A., Kampa, K., Mehta, S.H., Tungaraza, R.F., Chaovalitwongse, W.A., Grabowski, T.J.: Information-theoretic based feature selection for multi-voxel pattern analysis of fMRI data. In: Brain Informatics, pp. 196–208. Springer (2012)
Chou, C.-A., Kampa, K., Mehta, S.H., Tungaraza, R.F., Chaovalitwongse, W.A., Grabowski, T.J.: Voxel selection framework in multi-voxel pattern analysis of fMRI signals for prediction of neural response to visual stimuli. IEEE Trans. Med. Imag., under review (2013)
Chu, C., Kyun, K.S., Kunle, O.: Map-reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007)
Coutanche, M.N., Thompson-Schill, S.L.: The advantage of brief fmri acquisition runs for multi-voxel pattern detection across runs. Neuroimage 61(4), 1113–1119 (2012)
Cui, Y., Jin, J., Zhang, S., Luo, S., Tian, Q.: Correlation-based feature selection and regression. In: Qiu, G., Lam, K., Kiya, H., Xue, X.-Y., Kuo, C.-C., Lew, M. (eds.) Advances in Multimedia Information Processing—PCM 2010, vol. 6297 of Lecture Notes in Computer Science, pp. 25–35. Springer, Berlin, Heidelberg (2010) ISBN 978-3-642-15701-1
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Desikan, R.S., Ségonne, F., Fischl, B., Blacker, D., et al.: An automated labeling system for subdividing the human cerebral cortex on mri scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York, NY (2009)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Soft. 33(1), 1 (2010a)
Friedman, J., Hastie, T., Tibshirani, R.: Lasso (l1) and elastic-net regularized generalized linear models (2010b). http://www-stat.stanford.edu/tibs/glmnet-matlab/
Fuchs, J.-J.: On the application of the global matched filter to DOA estimation with uniform circular arrays. IEEE Trans. Signal Process. 49(4), 702–709 (2001)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhil, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Hanke, M., Halchenko, Y.O., Sederberg, P.B., Haxby, J.V.: Pymvpa: A python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics 7(1), 37–53 (2009)
Hanson, S.J., Matsuka, T., Haxby, J.V.: Combinatorial codes in ventral temporal lobe for object recognition: Haxby (2001) revisited: is there a face area? Neuroimage 23(1), 156–166 (2001)
Haxby, J.V., Gobbini, M.I., Ishai, A., Pietrini, P.: Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293(5539), 2425–2430 (2001)
Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Faces and objects in ventral temporal cortex (fMRI). http://data.pymvpa.org/datasets/haxby2001/ (2010)
Haynes, J.-D., Rees, G.: Decoding mental states from brain activity in humans. Neuroscience 7, 523–534 (2006)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 18, 507 (2006)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Koh, K., Kim, S.-J., Boyd, S.: An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Komarek, P.: Logistic regression for data mining and high-dimensional classification. Robotics Institute, p. 222 (2004)
Krause, A., Guestrin, C.: Near-optimal nonmyopic value of information in graphical models. arXiv, preprint arXiv:1207.1394 (2012)
Krause, A., Guestrin, C., Gupta, A., Kleinberg, J.: Near-optimal sensor placements: maximizing information while minimizing communication cost. In: Proceedings of the 5th International Conference on Information Processing in Sensor Networks, pp. 2–10. ACM (2006)
Le Cun, L.B.Y., Bottou, L.: Large scale online learning. Adv. Neural Inf. Process. Syst. 16, 217 (2004)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Lovász, L.: Submodular functions and convexity. In: Mathematical Programming: The State of the Art, pp. 235–257. Springer (1983)
Mangasarian, O.L.: Minimum-support solutions of polyhedral concave programs*. Optimization 45(1–4), 149–162 (1999)
Misaki, M., Kim, Y., Bandettini, P.A., Kriegeskorte, N.: Comparison of multivariate classifiers and response normalizations for pattern-information fMRI. NeuroImage 53(1), 103–118 (2010)
Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., Just, M.A.: Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008)
Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., Just, M.A.: Supplemental web site in support of the paper: predicting human brain activity associated with the meanings of nouns, September (2009). http://www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html/
Mumford, J.A., Turner, B.O., Ashby, F.G., Poldrack, R.A.: Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. NeuroImage 59(3), 2636–2643 (2012)
Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V.: Beyond mind-reading: multi-voxel pattern analysis of fMRI data. RENDS Cogn. Sci. 10(9), 424–430 (2006)
O’toole, A.J., Jiang, F., Abdi, H.: Partially distributed representations of objects and faces in ventral temporal cortex. J. Cogn. Neurosci. 17(4), 580–590 (2005)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005). ISSN 0162–8828. doi:10.1109/TPAMI.2005.159
Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: a tutorial overview. NeuroImage 45, 199–209 (2009)
Poldrack, R.A., Mumford, J.A., Nichols, T.E.: Handbook of Functional MRI Data Analysis. Cambridge University Press, Cambridge (2011)
Quinlan, J.R.: C4. 5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, Los Altos (1993)
Reunanen, J.: Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3, 1371–1382 (2003)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Song, L., Smola, A., Gretton, A., Borgwardt, K. M., Bedo, J.: Supervised feature selection via dependence estimation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 823–830. ACM (2007)
Thomas, J.A., Cover, T.M.: Elements of Information Theory. Wiley, New York (2006)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological), 267–288 (1996)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98(9), 5116–5121 (2001)
Verleysen, M., Rossi, F., François, D.: Advances in feature selection with mutual information. In: Biehl, M., Hammer, B., Verleysen, M., Villmann, T. (eds.) Similarity-Based Clustering, pp. 52–69. Springer, Berlin, Heidelberg (2009) ISBN 978-3-642-01804-6
Vinh, La The, Thang, N.D., Lee, Y.-K.: An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: International Symposium on Applications and the Internet, IEEE/IPSJ vol. 0, pp. 395–398 (2010)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13, pp. 668–674. MIT Press (2001)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Woolrich, M.W., Ripley, B.D., Brady, M., Smith, S.M.: Temporal autocorrelation in univariate linear modeling of fMRI data. Neuroimage 14(6), 1370–1386 (2001)
Xu, Z., King, I., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, pp. 856–863 (2003)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, p. 116. ACM (2004)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, MN, pp. 1151–1158 (2007)
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Aneeth, A., Huan, L.: Advancing feature selection research, ASU Feature Selection Repository (2010)
Zhou, N., Wang, L.: A modified t-test feature selection method and its application on the hapmap genotype data. Genomics, Proteomics Bioinf. 5(3), 242–249 (2007)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Statistical Methodology) 67(2), 301–320 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kampa, K., Mehta, S., Chou, C.A. et al. Sparse optimization in feature selection: application in neuroimaging. J Glob Optim 59, 439–457 (2014). https://doi.org/10.1007/s10898-013-0134-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-013-0134-2