Abstract
Due to the large number of genes measured in a typical microarray dataset, feature selection plays an essential role in tumor classification. In turn, relevance and redundancy are key components in determining the optimal predictor set. However, a third component – the relative weights given to the first two also assumes an equal, if not greater importance in feature selection. Based on this third component, we developed two novel feature selection methods capable of producing high, unbiased classification accuracy in multiclass microarray dataset. In an in-depth analysis comparing the two methods, the optimal values of the relative weights are also estimated.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: McDonald, C. (ed.) Proc. of the 21st Australasian Computer Science Conference, pp. 181–191. Springer, Singapore (1998)
Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proc. 2nd IEEE Computational Systems Bioinformatics Conference, pp. 523–529. IEEE Computer Society, Los Alamitos (2003)
Knijnenburg, T.A.: Selecting relevant and non-redundant features in microarray classification applications. M.Sc. Thesis. Faculty of Electrical Engineering, Mathematics, and Computer Science (EEMCS) of the Delft University of Technology (2004), http://ict.ewi.tudelft.nl/pub/marcel/Knij05b.pdf
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97, 77–87 (2002)
Yu, L., Liu, H.: Efficiently Handling Feature Redundancy in High-Dimensional Data. In: Domingos, P., Faloutsos, C., Senator, T., Kargupta, H., Getoor, L. (eds.) Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 685–690. ACM Press, New York (2003)
Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 99, 6562–6566 (2002)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large Margin DAGs for Multiclass Classification. In: Advances in Neural Information Processing Systems (NIPS), vol. 12, pp. 547–553 (2000)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. 98, 15149–15154 (2001)
Rifkin, R., Mukherjee, S., Tamayo, P., Ramaswamy, S., Yeang, C.H., Angelo, M., Reich, M., Poggio, T., Lander, E.S., Golub, T.R., Mesirov, J.P.: An Analytical Method for Multiclass Molecular Cancer Classification. SIAM Review 45(4), 706–723 (2003)
Linder, R., Dew, D., Sudhoff, H., Theegarten, D., Remberger, K., Poppl, S.J., Wagner, M.: The Subsequent Artificial Neural Network (SANN) Approach Might Bring More Classificatory Power To ANN-based DNA Microarray Analyses. Bioinformatics Advance Access. Published on July 29, Bioinformatics (2004), doi:10.1093/bioinformatics/bth441
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ooi, C.H., Chetty, M. (2005). A Comparative Study of Two Novel Predictor Set Scoring Methods. In: Gallagher, M., Hogan, J.P., Maire, F. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2005. IDEAL 2005. Lecture Notes in Computer Science, vol 3578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508069_56
Download citation
DOI: https://doi.org/10.1007/11508069_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26972-4
Online ISBN: 978-3-540-31693-0
eBook Packages: Computer ScienceComputer Science (R0)