Abstract
This work tackles a problem of building predictive models from results of DNA microarray experiments. Data analysis challenges related to high dimensionality of data and small number of samples usually available from such experiments are discussed. A method is proposed to adaptively select the right number of genes to be used as features for a predictive model in order to avoid overfitting which seems to be the major risk in microarray studies. The approach proposed is illustrated by a numerical example based on a gene expression profiles from two types of acute leukemia (data originally published by Golub).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bittner M, Meltzer P, Chen Y (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406:536–540
Dudoit S, Shaffer J, Boldrick J (2002) Multiple Hypothesis Testing in Microarray Experiments. UC Berkeley Division of Biostatistics Working Paper Series, Paper110.
Eisen M, et al. (1998) Proc. Natl. Acad. Sci. USA 95:14863–14868
Everitt B (1980) Cluster Analysis, Second Edition. Heineman Educational Books Ltd., London
Ewens W, Grant G (2001) Statistical Methods in Bioinformatics. Springer, Berlin Heidelberg New York
Faller D, et al. (2003) Journal of Computational Biology 10:751–762
Golub T, et al. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537.
Hastie T, Tibshirani R, Friedman J (2002) The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, Berlin Heidelberg New York
Hoffmann R, Seidl T, Dugas M (2002) Profound effect of normalization on detection of differently expressed genes in oligonucleotide microarray data analysis. Genome Biology
Maciejewski H, Jasinska A (2005) Clustering DNA microarray data. Computer recognition systems CORES 05, Springer Advances in Soft Computing
Maciejewski H, Konarski L (2007) Building a predictive model from data in high dimensions with application to analysis of microarray experiments. DepCoS — RELCOMEX, IEEE Computer Society Press
MAQC Consortium [Shi L. et al.] (2006) The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24
Markowetz F, Spang R (2005) Molecular diagnosis. Classification, Model Selection and performance evaluation, Methods Inf. Med. 44:438–443
Quackenbush J (2001) Nature Reviews Genetics 2:418–427
Shannon W, Culverhouse R, Duncann J (2003) Pharmacogenomics 4:41–51
Tamayo P, et al. (1999) Proc. Natl. Acad. Sci. USA 96:2907–29120
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maciejewski, H. (2007). Adaptive Selection of Feature Set Dimensionality for Classification of DNA Microarray Samples. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds) Computer Recognition Systems 2. Advances in Soft Computing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75175-5_103
Download citation
DOI: https://doi.org/10.1007/978-3-540-75175-5_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75174-8
Online ISBN: 978-3-540-75175-5
eBook Packages: EngineeringEngineering (R0)