Abstract
Microarray data usually contains a high level of noisy gene data, the noisy gene data include incorrect, noise and irrelevant genes. Before Microarray data classification takes place, it is desirable to eliminate as much noisy data as possible. An approach to improving the accuracy and efficiency of Microarray data classification is to make a small selection from the large volume of high dimensional gene expression dataset. An effective gene selection helps to clean up the existing Microarray data and therefore the quality of Microarray data has been improved. In this paper, we study the effectiveness of the gene selection technology for Microarray classification methods. We have conducted some experiments on the effectiveness of gene selection for Microarray classification methods such as two benchmark algorithms: SVMs and C4.5. We observed that although in general the performance of SVMs and C4.5 are improved by using the preprocessed datasets rather than the original data sets in terms of accuracy and efficiency, while an inappropriate choice of gene data can only be detrimental to the power of prediction. Our results also implied that with preprocessing, the number of genes selected affects the classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ding, C.H.Q.: Unsupervised feature selection via two-way ordering in gene expression analysis. Bioinformatics 19(10), 1259–1266 (2003)
Li, S., Wu, X., Hu, X.: Gene selection using genetic algorithm and support vectors machines. Soft Comput. 12(7), 693–698 (2008)
Song, M., Rajasekaran, S.: A greedy correlation-incorporated SVM-based algorithm for gene selection. In: AINA Workshops (1), pp. 657–661. IEEE Computer Society, Los Alamitos (2007)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Veer, L.V., Dai, H., de Vijver, M.V., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Mukkamala, S., Liu, Q., Veeraghattam, R., Sung, A.H.: Feature selection and ranking of key genes for tumor classification: Using Microarray gene expression data. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 951–961. Springer, Heidelberg (2006)
Liu, X., Krishnan, A., Mondry, A.: An entropy-based gene selection method for cancer classification using Microarray data. BMC Bioinformatics 6, 76 (2005)
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition 40(11), 3236–3248 (2007)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)
Yu, L., Liu, H.: Redundancy based feature selection for Microarray data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 737–742 (2004)
Blanco, R., Larrañaga, P., Inza, I., Sierra, B.: Gene selection for cancer classification using wrapper approaches. IJPRAI 18(8), 1373–1390 (2004)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines:an application to face detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1997)
Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support vector machine classification and validation of cancer tissue samples using Microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey Jr., T.M., Haussler, D.: Knowledge-based analysis of Microarray gene expression data by using suport vector machines. Proc. Natl. Acad. Sci. 97, 262–267 (2000)
Cho, S.B., Won, H.H.: Machine learning in DNA Microarray analysis for cancer classification. In: CRPITS’19: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, Darlinghurst, Australia, pp. 189–198. Australian Computer Society, Inc. (2003)
Li, J., Liu, H.: Kent ridge bio-medical data set repository (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Z., Li, J., Hu, H., Zhou, H. (2010). On the Effectiveness of Gene Selection for Microarray Classification Methods. In: Nguyen, N.T., Le, M.T., ÅšwiÄ…tek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5991. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12101-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-12101-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12100-5
Online ISBN: 978-3-642-12101-2
eBook Packages: Computer ScienceComputer Science (R0)