Sparse optimization in feature selection: application in neuroimaging

Kampa, K.; Mehta, S.; Chou, C. A.; Chaovalitwongse, W. A.; Grabowski, T. J.

doi:10.1007/s10898-013-0134-2

Sparse optimization in feature selection: application in neuroimaging

Published: 08 January 2014

Volume 59, pages 439–457, (2014)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

K. Kampa^1,2,
S. Mehta^3,4,
C. A. Chou⁵,
W. A. Chaovalitwongse^1,2,6 &
…
T. J. Grabowski^2,6,7

949 Accesses
25 Citations
Explore all metrics

Abstract

Feature selection plays an important role in the successful application of machine learning techniques to large real-world datasets. Avoiding model overfitting, especially when the number of features far exceeds the number of observations, requires selecting informative features and/or eliminating irrelevant ones. Searching for an optimal subset of features can be computationally expensive. Functional magnetic resonance imaging (fMRI) produces datasets with such characteristics creating challenges for applying machine learning techniques to classify cognitive states based on fMRI data. In this study, we present an embedded feature selection framework that integrates sparse optimization for regularization (or sparse regularization) and classification. This optimization approach attempts to maximize training accuracy while simultaneously enforcing sparsity by penalizing the objective function for the coefficients of the features. This process allows many coefficients to become zero, which effectively eliminates their corresponding features from the classification model. To demonstrate the utility of the approach, we apply our framework to three different real-world fMRI datasets. The results show that regularized classifiers yield better classification accuracy, especially when the number of initial features is large. The results further show that sparse regularization is key to achieving scientifically-relevant generalizability and functional localization of classifier features. The approach is thus highly suited for analysis of fMRI data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1), 237–260 (1998)
Article Google Scholar
Chou, C.-A., Kampa, K., Mehta, S.H., Tungaraza, R.F., Chaovalitwongse, W.A., Grabowski, T.J.: Information-theoretic based feature selection for multi-voxel pattern analysis of fMRI data. In: Brain Informatics, pp. 196–208. Springer (2012)
Chou, C.-A., Kampa, K., Mehta, S.H., Tungaraza, R.F., Chaovalitwongse, W.A., Grabowski, T.J.: Voxel selection framework in multi-voxel pattern analysis of fMRI signals for prediction of neural response to visual stimuli. IEEE Trans. Med. Imag., under review (2013)
Chu, C., Kyun, K.S., Kunle, O.: Map-reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007)
Google Scholar
Coutanche, M.N., Thompson-Schill, S.L.: The advantage of brief fmri acquisition runs for multi-voxel pattern detection across runs. Neuroimage 61(4), 1113–1119 (2012)
Article Google Scholar
Cui, Y., Jin, J., Zhang, S., Luo, S., Tian, Q.: Correlation-based feature selection and regression. In: Qiu, G., Lam, K., Kiya, H., Xue, X.-Y., Kuo, C.-C., Lew, M. (eds.) Advances in Multimedia Information Processing—PCM 2010, vol. 6297 of Lecture Notes in Computer Science, pp. 25–35. Springer, Berlin, Heidelberg (2010) ISBN 978-3-642-15701-1
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Desikan, R.S., Ségonne, F., Fischl, B., Blacker, D., et al.: An automated labeling system for subdividing the human cerebral cortex on mri scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006)
Article Google Scholar
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York, NY (2009)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Soft. 33(1), 1 (2010a)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Lasso (l1) and elastic-net regularized generalized linear models (2010b). http://www-stat.stanford.edu/tibs/glmnet-matlab/
Fuchs, J.-J.: On the application of the global matched filter to DOA estimation with uniform circular arrays. IEEE Trans. Signal Process. 49(4), 702–709 (2001)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Google Scholar
Guyon, I., Weston, J., Barnhil, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article Google Scholar
Hanke, M., Halchenko, Y.O., Sederberg, P.B., Haxby, J.V.: Pymvpa: A python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics 7(1), 37–53 (2009)
Article Google Scholar
Hanson, S.J., Matsuka, T., Haxby, J.V.: Combinatorial codes in ventral temporal lobe for object recognition: Haxby (2001) revisited: is there a face area? Neuroimage 23(1), 156–166 (2001)
Article Google Scholar
Haxby, J.V., Gobbini, M.I., Ishai, A., Pietrini, P.: Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293(5539), 2425–2430 (2001)
Article Google Scholar
Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Faces and objects in ventral temporal cortex (fMRI). http://data.pymvpa.org/datasets/haxby2001/ (2010)
Haynes, J.-D., Rees, G.: Decoding mental states from brain activity in humans. Neuroscience 7, 523–534 (2006)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 18, 507 (2006)
Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Koh, K., Kim, S.-J., Boyd, S.: An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article Google Scholar
Komarek, P.: Logistic regression for data mining and high-dimensional classification. Robotics Institute, p. 222 (2004)
Krause, A., Guestrin, C.: Near-optimal nonmyopic value of information in graphical models. arXiv, preprint arXiv:1207.1394 (2012)
Krause, A., Guestrin, C., Gupta, A., Kleinberg, J.: Near-optimal sensor placements: maximizing information while minimizing communication cost. In: Proceedings of the 5th International Conference on Information Processing in Sensor Networks, pp. 2–10. ACM (2006)
Le Cun, L.B.Y., Bottou, L.: Large scale online learning. Adv. Neural Inf. Process. Syst. 16, 217 (2004)
Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Lovász, L.: Submodular functions and convexity. In: Mathematical Programming: The State of the Art, pp. 235–257. Springer (1983)
Mangasarian, O.L.: Minimum-support solutions of polyhedral concave programs*. Optimization 45(1–4), 149–162 (1999)
Article Google Scholar
Misaki, M., Kim, Y., Bandettini, P.A., Kriegeskorte, N.: Comparison of multivariate classifiers and response normalizations for pattern-information fMRI. NeuroImage 53(1), 103–118 (2010)
Article Google Scholar
Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., Just, M.A.: Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008)
Article Google Scholar
Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., Just, M.A.: Supplemental web site in support of the paper: predicting human brain activity associated with the meanings of nouns, September (2009). http://www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html/
Mumford, J.A., Turner, B.O., Ashby, F.G., Poldrack, R.A.: Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. NeuroImage 59(3), 2636–2643 (2012)
Article Google Scholar
Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V.: Beyond mind-reading: multi-voxel pattern analysis of fMRI data. RENDS Cogn. Sci. 10(9), 424–430 (2006)
Article Google Scholar
O’toole, A.J., Jiang, F., Abdi, H.: Partially distributed representations of objects and faces in ventral temporal cortex. J. Cogn. Neurosci. 17(4), 580–590 (2005)
Article Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005). ISSN 0162–8828. doi:10.1109/TPAMI.2005.159
Google Scholar
Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: a tutorial overview. NeuroImage 45, 199–209 (2009)
Article Google Scholar
Poldrack, R.A., Mumford, J.A., Nichols, T.E.: Handbook of Functional MRI Data Analysis. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Quinlan, J.R.: C4. 5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, Los Altos (1993)
Google Scholar
Reunanen, J.: Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3, 1371–1382 (2003)
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Song, L., Smola, A., Gretton, A., Borgwardt, K. M., Bedo, J.: Supervised feature selection via dependence estimation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 823–830. ACM (2007)
Thomas, J.A., Cover, T.M.: Elements of Information Theory. Wiley, New York (2006)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological), 267–288 (1996)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98(9), 5116–5121 (2001)
Article Google Scholar
Verleysen, M., Rossi, F., François, D.: Advances in feature selection with mutual information. In: Biehl, M., Hammer, B., Verleysen, M., Villmann, T. (eds.) Similarity-Based Clustering, pp. 52–69. Springer, Berlin, Heidelberg (2009) ISBN 978-3-642-01804-6
Vinh, La The, Thang, N.D., Lee, Y.-K.: An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: International Symposium on Applications and the Internet, IEEE/IPSJ vol. 0, pp. 395–398 (2010)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13, pp. 668–674. MIT Press (2001)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Google Scholar
Woolrich, M.W., Ripley, B.D., Brady, M., Smith, S.M.: Temporal autocorrelation in univariate linear modeling of fMRI data. Neuroimage 14(6), 1370–1386 (2001)
Article Google Scholar
Xu, Z., King, I., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)
Article Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, pp. 856–863 (2003)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, p. 116. ACM (2004)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, MN, pp. 1151–1158 (2007)
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Aneeth, A., Huan, L.: Advancing feature selection research, ASU Feature Selection Repository (2010)
Zhou, N., Wang, L.: A modified t-test feature selection method and its application on the hapmap genotype data. Genomics, Proteomics Bioinf. 5(3), 242–249 (2007)
Article Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Statistical Methodology) 67(2), 301–320 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Washington, Seattle, WA, USA
K. Kampa & W. A. Chaovalitwongse
Integrated Brain Imaging Center, University of Washington Medical Center, Seattle, WA, USA
K. Kampa, W. A. Chaovalitwongse & T. J. Grabowski
Department of Radiology, University of Washington, Seattle, WA, USA
S. Mehta
Department of Psychology, University of Washington, Seattle, WA, USA
S. Mehta
Department of Systems Science and Industrial Engineering, Binghamton University, State University of New York, Vestal, NY, USA
C. A. Chou
Department of Radiology, University of Washington, Seattle, WA, USA
W. A. Chaovalitwongse & T. J. Grabowski
Department of Neurology, University of Washington, Seattle, WA, USA
T. J. Grabowski

Authors

K. Kampa
View author publications
You can also search for this author in PubMed Google Scholar
S. Mehta
View author publications
You can also search for this author in PubMed Google Scholar
C. A. Chou
View author publications
You can also search for this author in PubMed Google Scholar
W. A. Chaovalitwongse
View author publications
You can also search for this author in PubMed Google Scholar
T. J. Grabowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to W. A. Chaovalitwongse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kampa, K., Mehta, S., Chou, C.A. et al. Sparse optimization in feature selection: application in neuroimaging. J Glob Optim 59, 439–457 (2014). https://doi.org/10.1007/s10898-013-0134-2

Download citation

Received: 08 June 2013
Accepted: 18 December 2013
Published: 08 January 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10898-013-0134-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse optimization in feature selection: application in neuroimaging

Abstract

Access this article

Similar content being viewed by others

Feature Selection via Sparse Regression for Classification of Functional Brain Networks

Feature Selection for Decoding of Cognitive States in Multiple-Subject Functional Magnetic Resonance Imaging Data

A Robust Feature Selection Method for Classification of Cognitive States with fMRI Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse optimization in feature selection: application in neuroimaging

Abstract

Access this article

Similar content being viewed by others

Feature Selection via Sparse Regression for Classification of Functional Brain Networks

Feature Selection for Decoding of Cognitive States in Multiple-Subject Functional Magnetic Resonance Imaging Data

A Robust Feature Selection Method for Classification of Cognitive States with fMRI Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation