Gene Expression Clustering: Dealing with the Missing Values

Grużdź, Alicja; Ihnatowicz, Aleksandra; Ślęzak, Dominik

doi:10.1007/3-540-32392-9_63

Alicja Grużdź^3,4,
Aleksandra Ihnatowicz^3,4 &
Dominik Ślęzak^3,4

Part of the book series: Advances in Soft Computing ((AINSC,volume 31))

855 Accesses
3 Citations

Abstract

We propose a new method to deal with missing values in the gene expression data. It is applied to improve the quality of clustering genes with respect to their functionality. Calculations are run against real-life data, within the framework of self-organizing maps. The applied gene distances correspond to the rank-based Spearman correlation and entropy-based information measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.
Article Google Scholar
Baldi P, Hatfield WG (2002) DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press, Cambridge.
Google Scholar
de Brevern AG, Hazout S, Malpertuy A (2004) Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinformatics 5:114.
Article Google Scholar
Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980.
Article Google Scholar
Friedman JH, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin Heidelberg New York.
Google Scholar
Grużdź A, Ihnatowicz A, Ślęzak D (2005) Interactive SOM-Based Gene Grouping: An Approach To Gene Expression Data Analysis. In: Proc of ISMIS 2005, Springer, Berlin Heidelberg New York.
Google Scholar
Kapur JN, Kesavan HK (1992) Entropy Optimization Principles with Applications. Academic Press, San Diego.
Google Scholar
Khan AH, Ossadtchi A, Leahy RM, Smith DJ (2003) Error-correcting microarray design. Genomics 81:157–165.
Article Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biological Cybern 43:59–69.
Article MATH MathSciNet Google Scholar
Liu JS, Zhang, JL, Palumbo MJ, Lawrence CE (2003) Bayesian Clustering with Variable and Transformation Selections. In: Bayesian Statistics 7. Oxford University Press, Oxford, pp 249–275.
Google Scholar
Oba S et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19:2088–2096.
Article Google Scholar
Pawlak Z (1991) Rough sets — Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht.
Google Scholar
Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1997) GeneCards: encyclopedia for genes, proteins and diseases. Weizmann Institute of Science, Bioinformatiecs Unit and Genome Center.
Google Scholar
Ross DT et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235.
Article Google Scholar
Safran M et al (2003) Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31(1):142–146.
Article Google Scholar
Ślęzak D (2005) Rough entropy — non-parametric approach to measuring dependencies in quantitative data. In preparation.
Google Scholar
Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297.
Google Scholar
Tamayo P et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912.
Article Google Scholar
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Alicja Grużdź, Aleksandra Ihnatowicz & Dominik Ślęzak
Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Alicja Grużdź, Aleksandra Ihnatowicz & Dominik Ślęzak

Authors

Alicja Grużdź
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Ihnatowicz
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Ślęzak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Sciences, Polish Academy of Sciences, ul. Ordona 21, 01-237, Warszawa, Poland
Mieczysław A. Kłopotek , Sławomir T. Wierzchoń & Krzysztof Trojanowski , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grużdź, A., Ihnatowicz, A., Ślęzak, D. (2005). Gene Expression Clustering: Dealing with the Missing Values. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_63

Download citation

DOI: https://doi.org/10.1007/3-540-32392-9_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics