ABSTRACT
Correlation-based filtering gene selection methods have been shown to be quite effective for microarray data analysis, and hundreds of methods have been proposed in literature. In this paper, we extend the correlation of between genes and sample statues in a broader way where the relation between a gene vector and the label vector is particularly unique such that the relation cannot be replicated by randomly shuffling the gene expression values or sample status data. A two-layer of statistical analysis is performed on the original microarrays and label-shuffled data to identify the important gene markers. We design a simple metric---the difference of signal-to-noise between positive and negative classes---that doesn't work well for directly selecting top informative genes (verifying with linear SVM classifier); however, after collecting and ranking the second-level significance values of every gene on the original and many shuffled microarray data, the top selected genes have shown much better classification performance. Results on several public microarray data have shown genes selected by our method could also lead to high leave-one-out prediction accuracy.
- Dudoit, S., Fridlyand, J., and Speed, T. P. 2002. Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data. J. Am. Stat. Assoc., 97, 77--87.Google ScholarCross Ref
- Lee, Y. and Lee, C. K. 2003. Classification of Multiple Cancer Types by Multi-category Support Vector Machines using Gene Expression Data. Bioinformatics, 19(9), 1132--1139.Google ScholarCross Ref
- Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans. Knowledge and Data Eng., 16(11), 1370--1386. Google ScholarDigital Library
- Ressom, H. W., Varghese, R. S., Zhang, Z., Xuan, J., and Clarke, R. 2008. Classification Algorithms for Phenotype Prediction in Genomics and Proteomics. Front Biosci., 13, 691--708.Google Scholar
- Pirooznia, M, Yang, J. Y., Yang, M. Q., and Deng, Y. 2008. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics, 9(S1), S13.Google ScholarCross Ref
- Saeys, Y., Inza, I., and Larranaga, P. 2007. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics, 23(19), 2507--2517. Google ScholarDigital Library
- Liu, H. and Yu, L. 2005. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans. on Knowl. and Data Eng., 17(4), 491--502. Google ScholarDigital Library
- Tang, Y., Zhang, Y.-Q., and Huang, Z. 2007. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis. IEEE/ACM Trans. on Comp. Bio. and Bioinfo., 4(3), 365--381. DOI=http://dx.doi.org/10.1109/TCBB.2007.70224 Google ScholarDigital Library
- Chang, C.-C., Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. (2001) Available: http://www.csie.ntu.edu.tw/~cjlin/libsvmGoogle Scholar
- Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. 2002. Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389--422. Google ScholarDigital Library
Index Terms
- An effective filtering gene selection method for microarray data via shuffling and statistical analysis
Recommendations
An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer
Objective: The type of data in microarray provides unprecedented amount of data. A typical microarray data of ovarian cancer consists of the expressions of tens of thousands of genes on a genomic scale, and there is no systematic procedure to analyze ...
A gene selection method for microarray data based on sampling
ICCCI'10: Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part IIMicroarray technology has become an important tool for biologists in recent years. It can obtain the expressions of a large amount of genes in a single experiment. One of the research issues of microarray is to select a set of relevant genes from a ...
Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique
A PSO-adaptive KNN based gene selection method is proposed to select useful genes.A heuristic for selecting the optimal values of K efficiently is also proposed.The proposed technique is applied on SRBCT, ALL_AML and MLL microarray datasets.The ...
Comments