Skip to main content

Gene Expression Clustering: Dealing with the Missing Values

  • Conference paper
Intelligent Information Processing and Web Mining

Part of the book series: Advances in Soft Computing ((AINSC,volume 31))

Abstract

We propose a new method to deal with missing values in the gene expression data. It is applied to improve the quality of clustering genes with respect to their functionality. Calculations are run against real-life data, within the framework of self-organizing maps. The applied gene distances correspond to the rank-based Spearman correlation and entropy-based information measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.

    Article  Google Scholar 

  2. Baldi P, Hatfield WG (2002) DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press, Cambridge.

    Google Scholar 

  3. de Brevern AG, Hazout S, Malpertuy A (2004) Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinformatics 5:114.

    Article  Google Scholar 

  4. Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980.

    Article  Google Scholar 

  5. Friedman JH, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin Heidelberg New York.

    Google Scholar 

  6. Grużdź A, Ihnatowicz A, Ślęzak D (2005) Interactive SOM-Based Gene Grouping: An Approach To Gene Expression Data Analysis. In: Proc of ISMIS 2005, Springer, Berlin Heidelberg New York.

    Google Scholar 

  7. Kapur JN, Kesavan HK (1992) Entropy Optimization Principles with Applications. Academic Press, San Diego.

    Google Scholar 

  8. Khan AH, Ossadtchi A, Leahy RM, Smith DJ (2003) Error-correcting microarray design. Genomics 81:157–165.

    Article  Google Scholar 

  9. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biological Cybern 43:59–69.

    Article  MATH  MathSciNet  Google Scholar 

  10. Liu JS, Zhang, JL, Palumbo MJ, Lawrence CE (2003) Bayesian Clustering with Variable and Transformation Selections. In: Bayesian Statistics 7. Oxford University Press, Oxford, pp 249–275.

    Google Scholar 

  11. Oba S et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19:2088–2096.

    Article  Google Scholar 

  12. Pawlak Z (1991) Rough sets — Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht.

    Google Scholar 

  13. Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1997) GeneCards: encyclopedia for genes, proteins and diseases. Weizmann Institute of Science, Bioinformatiecs Unit and Genome Center.

    Google Scholar 

  14. Ross DT et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235.

    Article  Google Scholar 

  15. Safran M et al (2003) Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31(1):142–146.

    Article  Google Scholar 

  16. Ślęzak D (2005) Rough entropy — non-parametric approach to measuring dependencies in quantitative data. In preparation.

    Google Scholar 

  17. Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297.

    Google Scholar 

  18. Tamayo P et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912.

    Article  Google Scholar 

  19. Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grużdź, A., Ihnatowicz, A., Ślęzak, D. (2005). Gene Expression Clustering: Dealing with the Missing Values. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_63

Download citation

  • DOI: https://doi.org/10.1007/3-540-32392-9_63

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25056-2

  • Online ISBN: 978-3-540-32392-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics