Skip to main content

Incorporating Biological Domain Knowledge into Cluster Validity Assessment

  • Conference paper
Applications of Evolutionary Computing (EvoWorkshops 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:

Abstract

This paper presents an approach for assessing cluster validity based on similarity knowledge extracted from the Gene Ontology (GO) and databases annotated to the GO. A knowledge-driven cluster validity assessment system for microarray data was implemented. Different methods were applied to measure similarity between yeast genes products based on the GO. This research proposes two methods for calculating cluster validity indices using GO-driven similarity. The first approach processes overall similarity values, which are calculated by taking into account the combined annotations originating from the three GO hierarchies. The second approach is based on the calculation of GO hierarchy-independent similarity values, which originate from each of these hierarchies. A traditional node-counting method and an information content technique have been implemented to measure knowledge-based similarity between genes products (biological distances). The results contribute to the evaluation of clustering outcomes and the identification of optimal cluster partitions, which may represent an effective tool to support biomedical knowledge discovery in gene expression data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fitch, J., Sokhansanj, B.: Genomic engineering: Moving beyond DNA. Sequence to function. Proceedings of the IEEE 88, 1949–1971 (2000)

    Article  Google Scholar 

  2. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19, 2381–2389 (2003)

    Article  Google Scholar 

  3. Lee, S., Hur, J., Kim, Y.: A graph-theoretic modeling on go space for biological interpretation on gene clusters. Bioinformatics 20, 381–388 (2004)

    Article  Google Scholar 

  4. Goeman, J., van de Geer, S., de Kort, F., van Houwelingen, H.: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20, 93–99 (2004)

    Article  Google Scholar 

  5. Raychaudhuri, S., Altman, R.: A literature-based method for assessing the functional coherence of a gene group. Bioinformatics 19, 396–401 (2003)

    Article  Google Scholar 

  6. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, S145–S154 (2002)

    Google Scholar 

  7. Sohler, F., Hanisch, D., Zimmer, R.: New methods for joint analysis of biological networks and expression data. Bioinformatics 20, 1517–1521 (2004)

    Article  Google Scholar 

  8. Khatri, P., Drǎghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005)

    Article  Google Scholar 

  9. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83, 825–833 (2003)

    Article  MATH  Google Scholar 

  10. Bolshakova, N., Azuaje, F.: Machaon CVE: cluster validation for gene expression data. Bioinformatics 19, 2494–2495 (2003)

    Article  Google Scholar 

  11. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico State University, Las Cruces, New, Mexico, pp. 133–138 (1994)

    Google Scholar 

  12. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pp. 448–453 (1995)

    Google Scholar 

  13. Azuaje, F., Bodenreider, O.: Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study. In: Proceedings of the fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2004), pp. 317–324 (2004)

    Google Scholar 

  14. Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, La Jolla-California, pp. 25–31. IEEE Press, Los Alamitos (2004)

    Google Scholar 

  15. Bolshakova, N., Azuaje, F., Cunningham, P.: A knowledge-driven approach to cluster validity assessment. Bioinformatics 21, 2546–2547 (2005)

    Article  Google Scholar 

  16. Speer, N., Spieth, C., Zell, A.: A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), pp. 252–259. IEEE Press, Los Alamitos (2004)

    Google Scholar 

  17. Speer, N., Spieth, C., Zell, A.: Functional grouping of genes using spectral clustering and gene ontology. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2005), pp. 298–303. IEEE Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  18. Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A genomewide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)

    Article  Google Scholar 

  19. Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologie, 190–241 (1976)

    Google Scholar 

  20. Goodman, L., Kruskal, W.: Measures of associations for cross-validations. Journal of Ameracan Statistical Association, 732–764 (1954)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolshakova, N., Azuaje, F., Cunningham, P. (2006). Incorporating Biological Domain Knowledge into Cluster Validity Assessment. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_2

Download citation

  • DOI: https://doi.org/10.1007/11732242_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33237-4

  • Online ISBN: 978-3-540-33238-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics