research-article

Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems

Authors:

R. Mitchell Parry,

May D. WangAuthors Info & Claims

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pages 29 - 38

https://doi.org/10.1145/2147805.2147809

Published: 01 August 2011 Publication History

Abstract

Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. We propose a novel measure for assessing the suitability of machine classifiers for particular problems called "win percentage." We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features. We illustrate the utility of this method using synthetic data. Then, we evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application and will be accessible from www.biomiblab.org.

References

[1]

Teng, S., Luo, H. and Wang, L. 2010. Random forest-based prediction of protein sumoylation sites from sequence features. In Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (Niagara Falls, New York, Aug. 2--4, 2010). ACM, 120--126. DOI= http://dx.doi.org/10.1145/1854776.1854797.

Digital Library

[2]

Altiparmak, F., Gibas, M. and Ferhatosmanoglu, H. 2010. Relationship preserving feature selection for unlabelled clinical trials time-series. In Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (Niagara Falls, New York, Aug. 2--4, 2010). ACM, 7--16. DOI= http://dx.doi.org/10.1145/1854776.1854784.

Digital Library

[3]

Hua, J., Tembe, W. D. and Dougherty, E. R. 2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition. 42, 3, 409--424. DOI= http://dx.doi.org/10.1016/j.patcog.2008.08.001.

Digital Library

[4]

Dash, M. and Liu, H. 1997. Feature selection for classification. Intelligent data analysis. 1, 3, 131--156.

[5]

Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research. 3, 1157--1182.

Digital Library

[6]

Gutkin, M., Shamir, R., Dror, G., et al. 2009. SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification. PloS one. 4, 7, e6416. DOI= http://dx.doi.org/10.1371/journal.pone.0006416.

[7]

Chandra, B. and Gupta, M. 2011. An Efficient Statistical Feature Selection Approach for Classification of Gene Expression Data. Journal of Biomedical Informatics. DOI= http://dx.doi.org/10.1016/j.jbi.2011.01.001.

Digital Library

[8]

Parry, R. M., Jones, W., Stokes, T. H., et al. 2010. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J. 10, 4, 292--309. DOI= http://dx.doi.org/10.1038/tpj.2010.56.

[9]

Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artificial intelligence. 97, 1--2, 273--324.

Digital Library

[10]

Horowitz, E., Sahni, S. and Rajasekaran, S. 1998. Computer algorithms. Computer Science Press, New York.

[11]

Liu, H. and Setiono, R. 1996. Feature Selection and Classification: A Probabilistic Wrapper Approach. Industrial and Engineering Applications of Artificial Intelligence and Expert Systems. 419.

[12]

Dramiński, M., Rada-Iglesias, A., Enroth, S., et al. 2008. Monte Carlo feature selection for supervised classification. Bioinformatics. 24, 1, 110.

Digital Library

[13]

Miller, B. L. and Goldberg, D. E. 1995. Genetic Algorithms, Tournament Selection, and the Effects of Noise. Complex Systems. 9, 3, 193--212.

[14]

Shi, L., Campbell, G., Jones, W. D., et al. 2010. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotech. 28, 8, 827--838. DOI= http://dx.doi.org/10.1038/nbt.1665.

[15]

Gong, Y., Yan, K., Lin, F., et al. 2007. Determination of oestrogen-receptor status and ERBB2 status of breast carcinoma: a gene-expression profiling study. Lancet Oncol. 8, 3 (Mar 2007), 203--211. DOI= http://dx.doi.org/10.1016/S1470-2045(07)70042-6.

[16]

Shaughnessy, J. D., Jr., Zhan, F., Burington, B. E., et al. 2007. A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood. 109, 6 (Mar 15, 2007), 2276--2284. DOI= http://dx.doi.org/10.1182/blood-2006-07-038430.

[17]

Oberthuer, A., Berthold, F., Warnat, P., et al. 2006. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J Clin Oncol. 24, 31 (Nov 1, 2006), 5070--5078. DOI= http://dx.doi.org/10.1200/JCO.2006.06.1879.

[18]

Efron, B. and Tibshirani, R. 1997. Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association. 92, 438, 548--560.

[19]

Miller, B. L. and Goldberg, D. E. 1996. Genetic algorithms, selection schemes, and the varying effects of noise. Evolutionary Computation. 4, 2, 113--131.

Digital Library

[20]

Harter, H. L. 1961. Expected Values of Normal Order Statistics. Biometrika. 48, 1/2, 151--165.

Index Terms

Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning

Recommendations

Class prediction in microarray studies based on activation of pathways
HAIS'11: Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I

This paper presents a novel approach to building sample classifiers based on microarray gene expression studies. This approach differs from standard methods in the way features are selected. Standard methods focus on features (genes) with most ...
Derivation of an artificial gene to improve classification accuracy upon gene selection

Classification analysis has been developed continuously since 1936. This research field has advanced as a result of development of classifiers such as KNN, ANN, and SVM, as well as through data preprocessing areas. Feature (gene) selection is required ...
Computerized system for recognition of autism on the basis of gene expression microarray data

The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

August 2011

688 pages

ISBN:9781450307963

DOI:10.1145/2147805

General Chairs:
Robert Grossman
University of Chicago
,
Andrey Rzhetsky
University of Chicago
,
Program Chairs:
Sun Kim
Indiana University Bloomington and Seoul National University
,
Wei Wang
University of North Carolina at Chapel Hill

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

BCB' 11

Sponsor:

SIGBio

BCB' 11: ACM International Conference on Bioinformatics, Computational Biology and Biomedicine

August 1 - 3, 2011

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
75
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten