k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification

Yoon, Sejong; Kim, Saejoon

doi:10.1007/s00500-009-0437-x

k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification

Focus
Published: 03 June 2009

Volume 14, pages 151–159, (2010)
Cite this article

Soft Computing Aims and scope Submit manuscript

Sejong Yoon¹ &
Saejoon Kim¹

450 Accesses
9 Citations
Explore all metrics

Abstract

Top Scoring Pair (TSP) and its ensemble counterpart, k-Top Scoring Pair (k-TSP), were recently introduced as competitive options for solving classification problems of microarray data. However, support vector machine (SVM) which was compared with these approaches is not equipped with feature or variable selection mechanism while TSP itself is a kind of variable selection algorithm. Moreover, an ensemble of SVMs should also be considered as a possible competitor to k-TSP. In this work, we conducted a fair comparison between TSP and SVM-recursive feature elimination (SVM-RFE) as the feature selection method for SVM. We also compared k-TSP with two ensemble methods using SVM as their base classifier. Results on ten public domain microarray data indicated that TSP family classifiers serve as good feature selection schemes which may be combined effectively with other classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of Top-Ranked Features Using Consensus Affinity of State-of-the-Art Methods

Efficient Feature Selection Algorithm for Gene Classification

Selection and Classification of Gene Expression Data Using a MF-GA-TS-SVM Approach

Notes

http://leo.ugr.es/elvira/DBCRepository/index.html. Accessed 12 Jun 2008
http://faculty.vassar.edu/lowry/kappa.htm. Accessed 11 Mar 2009

References

Alizadeh AA, Eisen MB, Davis EE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Article Google Scholar
Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Beer DG, Kardia SL, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–824
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH MathSciNet Google Scholar
Buciu I, Kotropoulos C, Pitas I (2006) Demonstrating the stability of support vector machines for classification. Signal Process 86(9):2364–2380
Article Google Scholar
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205
Article MathSciNet Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MATH MathSciNet Google Scholar
Geman D, d’Avignon C, Naiman D, Winslow R (2004) Classifying gene expression profiles from pairwise mrna comparisons. Stat Appl Genet Mol Biol 3(1):19
MathSciNet Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Gordon GJ, Jensen RV, li Hsiao L et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Article MATH Google Scholar
Joachims T (1999) Making large-scale support vector machine learning practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 169–184
Google Scholar
Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767
Article MATH Google Scholar
Lai C, Reinders M, Veer LV, Wessels L (2006) A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics 7(1), http://dx.doi.org/10.1186/1471-2105-7-235
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
Google Scholar
Pomeroy SL, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Article Google Scholar
Rosenwald A, Wright G, Chan WC et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947
Article Google Scholar
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Article Google Scholar
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21(20):3896–3904
Article Google Scholar
Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience
Wigle DA, Jurisica I, Radulovich N et al (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005–3008
Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann

Download references

Acknowledgements

The authors would like to appreciate anonymous reviewers for their valuable comments that improved the presentation of this paper. The work of S. Kim was supported by the Special Research Grant of Sogang University 200811028.01.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sogang University, Seoul, 121-742, Korea
Sejong Yoon & Saejoon Kim

Authors

Sejong Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Saejoon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saejoon Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoon, S., Kim, S. k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification. Soft Comput 14, 151–159 (2010). https://doi.org/10.1007/s00500-009-0437-x

Download citation

Published: 03 June 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s00500-009-0437-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification

Abstract

Access this article

Similar content being viewed by others

Identification of Top-Ranked Features Using Consensus Affinity of State-of-the-Art Methods

Efficient Feature Selection Algorithm for Gene Classification

Selection and Classification of Gene Expression Data Using a MF-GA-TS-SVM Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification

Abstract

Access this article

Similar content being viewed by others

Identification of Top-Ranked Features Using Consensus Affinity of State-of-the-Art Methods

Efficient Feature Selection Algorithm for Gene Classification

Selection and Classification of Gene Expression Data Using a MF-GA-TS-SVM Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation