Abstract
Selection of reliable genes from a huge gene expression data containing high intergene correlation is essential to carry out a diagnostic test and successful treatment. In this regard, a rough set based gene selection algorithm is reported, which selects a set of genes by maximizing the relevance and significance of the selected genes. A gene ontology-based similarity measure is proposed to analyze the functional diversity of the selected genes. It also helps to analyze the effectiveness of different gene selection methods. The performance of the rough set based gene selection algorithm, along with a comparison with other gene selection methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on two cancer and one arthritis microarray data sets. An important finding is that the rough set based gene selection algorithm selects more functionally diverse set of genes than the existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Domany, E.: Cluster Analysis of Gene Expression Data. Journal of Statistical Physics 110(3-6), 1117–1139 (2003)
Du, Z., Li, L., Chen, C.F., Yu, P.S., Wang, J.Z.: G-sesame: Web tools for go-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Research 37, W345–W349 (2009)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. John Wiley and Sons, New York (1999)
Hall, M.: Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)
Loennstedt, I., Speed, T.P.: Replicated microarray data. Statistica Sinica 12, 31–46 (2002)
Maji, P., Paul, S.: Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning 52(3), 408–426 (2011)
Pal, S.K., Mitra, S.: Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. Wiley, New York (1999)
Tusher, V., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences 98, 5116–5121 (2001)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paul, S., Maji, P. (2011). Rough Sets for Selection of Functionally Diverse Genes from Microarray Data. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27172-4_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-27172-4_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27171-7
Online ISBN: 978-3-642-27172-4
eBook Packages: Computer ScienceComputer Science (R0)