Abstract
Top Scoring Pair (TSP) and its ensemble counterpart, k-Top Scoring Pair (k-TSP), were recently introduced as competitive options for solving classification problems of microarray data. However, support vector machine (SVM) which was compared with these approaches is not equipped with feature or variable selection mechanism while TSP itself is a kind of variable selection algorithm. Moreover, an ensemble of SVMs should also be considered as a possible competitor to k-TSP. In this work, we conducted a fair comparison between TSP and SVM-recursive feature elimination (SVM-RFE) as the feature selection method for SVM. We also compared k-TSP with two ensemble methods using SVM as their base classifier. Results on ten public domain microarray data indicated that TSP family classifiers serve as good feature selection schemes which may be combined effectively with other classification methods.
Similar content being viewed by others
Notes
http://faculty.vassar.edu/lowry/kappa.htm. Accessed 11 Mar 2009
References
Alizadeh AA, Eisen MB, Davis EE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Beer DG, Kardia SL, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–824
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Buciu I, Kotropoulos C, Pitas I (2006) Demonstrating the stability of support vector machines for classification. Signal Process 86(9):2364–2380
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Geman D, d’Avignon C, Naiman D, Winslow R (2004) Classifying gene expression profiles from pairwise mrna comparisons. Stat Appl Genet Mol Biol 3(1):19
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Gordon GJ, Jensen RV, li Hsiao L et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Joachims T (1999) Making large-scale support vector machine learning practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 169–184
Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767
Lai C, Reinders M, Veer LV, Wessels L (2006) A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics 7(1), http://dx.doi.org/10.1186/1471-2105-7-235
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
Pomeroy SL, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Rosenwald A, Wright G, Chan WC et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21(20):3896–3904
Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience
Wigle DA, Jurisica I, Radulovich N et al (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005–3008
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann
Acknowledgements
The authors would like to appreciate anonymous reviewers for their valuable comments that improved the presentation of this paper. The work of S. Kim was supported by the Special Research Grant of Sogang University 200811028.01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yoon, S., Kim, S. k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification. Soft Comput 14, 151–159 (2010). https://doi.org/10.1007/s00500-009-0437-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-009-0437-x