Abstract
Consistency modelling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of operations such as classification, clustering, or gene selection on a training set is often found to be very different from the same operations on a testing set, presenting a serious consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, this paper proposes a new concept of classification consistency and applies it for microarray gene selection problem using a bootstrapping approach, with encouraging results.
Similar content being viewed by others
References
Ding C, Peng H (2003) Minimum Redundancy Feature Selection for Gene Expression Data. In: Paper presented at the Proc. IEEE Computer Society Bioinformatics Conference (CSB 2003), Stanford
Furey T, Cristianini N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Jaeger J, Sengupta R et al (2003) Improved gene selection for classification of microarrays. In: Paper presented at the Pacific Symposium on Biocomputing
Tusher V, Tibshirani R et al (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116–5121
Zhang C, Lu X, Zhang X (2006) Significance of gene ranking for classification of microarray samples. IEEE/ACM Trans Comput Biol Bioinform 3(3):312–320
Duch W, Biesiada J (2006) Margin based feature selection filters for microarray gene expression data. Int J Inform Technol Intell Comput 1:9–33
Draghici S, Kulaeva O et al (2003) Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays. Bioinformatics 19(11):1348–1359
Efron B, Tibshirani R et al (2001) Empirical bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160
Lee KE, Sha N et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19(1):90–97
Tibshirani RJ (2006) A simple method for assessing sample sizes in microarray experiments. BMC Bioinform 7:106
Kauai H, Kasabov N, Middlemiss M et al (2003) A generic connectionist-based method for on-line feature selection and modelling with a case study of gene expression data analysis. In: Paper presented at the Conferences in Research and Practice in Information Technology Series: proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, vol 19, Adelaide, Australia
Wang Z, Palade V, Xu Y (2006) Neuro-Fuzzy ensemble approach for microarray cancer gene expression data analysis. In: Proceedings of 2006 international symposium on evolving fuzzy systems, pp 241–246
Wolf L, Shashua A et al (2004) Selecting relevant genes with a spectral approach (No. CBCL Paper No.238). Massachusetts Institute of Technology, Cambridge
Huerta EB, Duval B et al (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. Lect Notes Comput Sci 3907:34–44
Alon U, Barkai N et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Li L, Weinberg CR et al (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142
Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10(4):338–345
Wahde M, Szallasi Z (2006) A Survey of methods for classification of gene expression data using evolutionary algorithms. Expert Rev Mol Diagn 6(1):101–110
Mukherjee S, Roberts SJ (2004) Probabilistic consistency analysis for gene selection. Paper presented at the CSB, Stanford
Mukherjee S, Roberts SJ et al (2005) Data-adaptive test statistics for microarray data. Bioinformatics 21(Suppl 2):ii108–ii114
Shipp MA, Ross KN et al (2002) Supplementary information for diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Golub TR (2004) Toward a functional taxonomy of cancer. Cancer Cell 6(2):107–108
Pomeroy S, Tamayo P et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Petricoin EF, Ardekani AM et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577
Van ’t Veer LJ, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
Gordon GJ, Jensen R et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62:4963–4967
Breiman L, Spector P (1992) Submodel selection and evaluation in regression: the Xrandom case60. Int Stat Rev 60:291–319
Kohavi R (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: Paper presented at the international joint conference on artificial intelligence (IJCAI), Montreal
Ransohoff DF (2005) Bias as a threat to the validity of cancer molecular marker research. Nat Rev Cancer 5(2):142149
Staal FJT, Cario G et al (2006) Consensus guidelines for microarray gene expression analyses in leukemia from three European leukemia networks. Leukemia 20:1385–1392
Allison DB, Cui X et al (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7:55–65
Kawasaki ES (2006) The end of the microarray tower of babel: will universal standards lead the way? J Biomol Tech 17:200–206
Pham TD, Wells C et al (2006) Analysis of microarray gene expression data. Curr Bioinform 1:37–53
Asyali MH, Colak D et al (2006) Gene expression profile classification: a review. Curr Bioinform 1:55–73
Sauerbrei W, Hollander N et al (2006) Evidence-based assessment and application of prognostic markers: the long way from single studies to meta-analysis. Commun Stat Theory Methods 35:1333–1342
Acknowledgments
The research presented in the paper was partially funded by the New Zealand Foundation for Research, Science and Technology under the grant: NERF/AUTX02-01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pang, S., Havukkala, I., Hu, Y. et al. Classification consistency analysis for bootstrapping gene selection. Neural Comput & Applic 16, 527–539 (2007). https://doi.org/10.1007/s00521-007-0110-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-007-0110-1