Applying 1-norm SVM with squared loss to gene selection for cancer classification

Zhang, Li; Zhou, Weida; Wang, Bangjun; Zhang, Zhao; Li, Fanzhang

doi:10.1007/s10489-017-1056-3

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Published: 15 September 2017

Volume 48, pages 1878–1890, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Li Zhang ORCID: orcid.org/0000-0001-7914-0679¹,
Weida Zhou²,
Bangjun Wang¹,
Zhao Zhang¹ &
…
Fanzhang Li¹

437 Accesses
21 Citations
Explore all metrics

Abstract

Gene selection methods available have high computational complexity. This paper applies an 1-norm support vector machine with the squared loss (1-norm SVMSL) to implement fast gene selection for cancer classification. The 1-norm SVMSL, a variant of the 1-norm support vector machine (1-norm SVM) has been proposed. Basically, the 1-norm SVMSL can perform gene selection and classification at the same. However, to improve classification performance, we only use the 1-norm SVMSL as a gene selector, and adopt a subsequent classifier to classify the selected genes. We perform extensive experiments on four DNA microarray data sets. Experimental results indicate that the 1-norm SVMSL has a very fast gene selection speed compared with other methods. For example, the 1-norm SVMSL is almost an order of magnitude faster than the 1-norm SVM, and at least four orders of magnitude faster than SVM-RFE (recursive feature elimination), a state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Improved Gene Expression Classification Through Multi-class Support Vector Machines Feature Selection

Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

References

Becker S, Bobin J, Candès E (2009) NESTA: A fast and accurate first-order method for sparse recovery. SIAM J Imaging Sci 4(1):1–39
Article MathSciNet MATH Google Scholar
Bennett KP (1999) Combining support vector and mathematical programming methods for classification. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 307–326
Bi J, Bennett KP, Embrechts M, Breneman CM, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
MATH Google Scholar
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53(2):381–389
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/libsvm
Cui Y, Zheng CH, Yang J, Sha W (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. Comput Biol Med 43(7):933–941
Article Google Scholar
Davis G, Mallat S, Avellaneda M (1997) Greedy adaptive approximation. J Construtive Approx 13:57–98
Article MATH Google Scholar
Demiriz A, Bennett KP, Shawe-Taylor J (2002) Linear programming Boosting via column generation. Mach Learn 46(1):225–254
Article MATH Google Scholar
Donoho D, Elad M, Temlyakov V (2006) Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans Inf Theory 52:6–18
Article MathSciNet MATH Google Scholar
Duan KB, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4(3):228–234
Article Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Stat Assoc 97(457):77–87
Article MathSciNet MATH Google Scholar
Fung GM, Mangasarian OL (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202
Article MathSciNet MATH Google Scholar
Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455–1480
Article Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5436):531–537
Article Google Scholar
Gordon G, Jensen R, Hsiao L, Gullans S, Blumenstock J, Ramaswamy S, Richards W, Sugarbaker D, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4937
Google Scholar
Guyon I, Weston J, Barnhill S, Vapink V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422
Article MATH Google Scholar
Lee C, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213
Article Google Scholar
Li JT, Jia YM, Li WL (2011) Adaptive huberized support vector machine and its application to microarray classification. Neural Comput Appl 20(1):123–132
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17 (12):1131–1142
Article Google Scholar
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87
Article Google Scholar
Makh A (2012) GLPK (GNU linear programming kit). http://www.gnu.org/software/glpk/glpk.html
Maldonado S, Montoya R, Lpez J (2017) Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty. Appl Intell 46:775–787
Article Google Scholar
Mangasarian OL (2000) Generalized support vector machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 135–146
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530
MathSciNet MATH Google Scholar
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 (6870):436–442
Article Google Scholar
Shah S, Kusiak A (2007) Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 37(2):251–261
Article Google Scholar
Shen Q, Mei Z, Ye BX (2009) Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification. Comput Biol Med 39(7):646–649
Article Google Scholar
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4(6):1071–1105
MathSciNet MATH Google Scholar
Stodden V, Carlin L, David Donoho EA (2007) SparseLab: Seeking sparse solutions to linear systems of equations. http://sparselab.stanford.edu/
Thi HAL, Tao PD, Thiao M (2016) Efficient approaches for L2-L0 regularization and applications to feature selection in SVM. Appl Intell 45(2):1–17
Google Scholar
Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inform Theroy 53(12):4655–4666
Article MathSciNet MATH Google Scholar
Van’t Veer LJ, Dai H, van de Vijver M, He Y, Hart A, Mao M, Peterse H, Van der Kooy K, Marton M, Witteveen A, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend S (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik V (1999) The overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Wang C, Cao L, Miao B (2013) Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data. Comput Stat Data Anal 66(10):140–149
Article MathSciNet Google Scholar
Wang HQ, Wong HS, Zhu H, Yip TT (2009) A neural network-based biomarker association information extraction approach for cancer classification. J Biomed Inform 42(4):654–666
Article Google Scholar
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. NIPS 12:668–672
Google Scholar
Wong HS, Wang HQ (2008) Constructing the gene regulation-level representation of microarray data for cancer classification. J Biomed Inform 41(1):95–105
Article Google Scholar
Zhang L, Huang X (2015) Multiple SVM-RFE for multi-class gene selection on dna microarray data. In: Proceedings of 2015 international joint conference on neural networks, pp 897–902
Zhang L, Zhou W (2013) Analysis of programming properties and the rowccolumn generation method for 1-norm support vector machines. Neural Netwx 48(12):32–43
Article MATH Google Scholar
Zhang L, Zhou W, Zhang Z, Yang J (2015) A fast approximation algorithm for 1-norm svm with squared loss. In: Proceedings of 2015 IEEE International Joint Conference on Neural Networks (IJCNN)
Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385
Article Google Scholar
Zhang L, Zhou W (2013) A fast algorithm for kernel 1-norm support vector machines. Knowl-Based Syst 52(16):223–235
Article Google Scholar
Zhou W, Zhang L, Jiao L (2002) Linear programming support vector machines. Pattern Recogn 35(12):2927–2936
Article MATH Google Scholar
Zhou X, Tuck DP (2007) MSVM-RFE: Extensions of SVM-RFE for multi-class gene selection on dna microarray data. Bioinformatics 23(9):1106C1114
Article Google Scholar
Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, pp 49–56

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (grant numbers 61373093, 61672364, and 61672365), by the Natural Science Foundation of Jiangsu Province of China (grant number BK20140008), by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number 13KJA520001), and by the Soochow Scholar Project.

Author information

Authors and Affiliations

School of Computer Science and Technology, Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang, Bangjun Wang, Zhao Zhang & Fanzhang Li
AI Speech Ltd., Suzhou, 215123, Jiangsu, China
Weida Zhou

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weida Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bangjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fanzhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Zhou, W., Wang, B. et al. Applying 1-norm SVM with squared loss to gene selection for cancer classification. Appl Intell 48, 1878–1890 (2018). https://doi.org/10.1007/s10489-017-1056-3

Download citation

Published: 15 September 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10489-017-1056-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Abstract

Access this article

Similar content being viewed by others

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Improved Gene Expression Classification Through Multi-class Support Vector Machines Feature Selection

Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Abstract

Access this article

Similar content being viewed by others

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Improved Gene Expression Classification Through Multi-class Support Vector Machines Feature Selection

Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation