Abstract
We propose a new method to deal with missing values in the gene expression data. It is applied to improve the quality of clustering genes with respect to their functionality. Calculations are run against real-life data, within the framework of self-organizing maps. The applied gene distances correspond to the rank-based Spearman correlation and entropy-based information measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.
Baldi P, Hatfield WG (2002) DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press, Cambridge.
de Brevern AG, Hazout S, Malpertuy A (2004) Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinformatics 5:114.
Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980.
Friedman JH, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin Heidelberg New York.
Grużdź A, Ihnatowicz A, Ślęzak D (2005) Interactive SOM-Based Gene Grouping: An Approach To Gene Expression Data Analysis. In: Proc of ISMIS 2005, Springer, Berlin Heidelberg New York.
Kapur JN, Kesavan HK (1992) Entropy Optimization Principles with Applications. Academic Press, San Diego.
Khan AH, Ossadtchi A, Leahy RM, Smith DJ (2003) Error-correcting microarray design. Genomics 81:157–165.
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biological Cybern 43:59–69.
Liu JS, Zhang, JL, Palumbo MJ, Lawrence CE (2003) Bayesian Clustering with Variable and Transformation Selections. In: Bayesian Statistics 7. Oxford University Press, Oxford, pp 249–275.
Oba S et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19:2088–2096.
Pawlak Z (1991) Rough sets — Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht.
Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1997) GeneCards: encyclopedia for genes, proteins and diseases. Weizmann Institute of Science, Bioinformatiecs Unit and Genome Center.
Ross DT et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235.
Safran M et al (2003) Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31(1):142–146.
Ślęzak D (2005) Rough entropy — non-parametric approach to measuring dependencies in quantitative data. In preparation.
Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297.
Tamayo P et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912.
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grużdź, A., Ihnatowicz, A., Ślęzak, D. (2005). Gene Expression Clustering: Dealing with the Missing Values. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_63
Download citation
DOI: https://doi.org/10.1007/3-540-32392-9_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)