ABSTRACT
Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilarity matrices. We demonstrate that classical clustering algorithms, which take dissimilarity matrices as inputs, do not incorporate all available information. In fact, only special types of dissimilarity matrices can be exactly preserved by previous clustering methods. We model ontologies as a partially ordered set (poset) over the subset relation. In this paper, we propose a new clustering algorithm, that generates a partially ordered set of clusters from a dissimilarity matrix.
- M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, G. Sherlock: Gene Ontology: tool for the unification of biology. Nat Genet 2000, 25:25--29.Google Scholar
- Applications of the pyramidal clustering method to biological objects. Comput Chem, 23(3-4):303--15, Jun 15, 1999.Google ScholarCross Ref
- P. Berkhin. Survey of clustering data mining techniques https://umdrive.memphis.edu/vphan/public/berkhin-survey.pdf, Accrue Software, 2002.Google Scholar
- P. Bertrand and M. F. Janowitz. Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, Volume 122, Issues 1-3, Pages 55--81, 15 October 2002. Google ScholarDigital Library
- C. Bron and J. Kerbosch, Algorithm 457: Finding all cliques of an undirected graph, Commun. ACM, vol. 16, no. 9, pp. 575--577, 1973. Google ScholarDigital Library
- Budanitsky, A., and G. Hirst, "Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures", Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA, June 2001.Google Scholar
- E. Diday, Orders and overlapping clusters in pyramids. In: J. De Leeuw et al. Multidimensional Data Analysis, DSWO Press, Leiden (1986), pp. 201--234.Google Scholar
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley and Sons, Inc., 2001. Google ScholarDigital Library
- L. K. Hua, Introduction to Number Theory. Springer-Verlag, New York, 1982.Google Scholar
- A. JAIN and R. Dubes. Algorithms for clustering data. Prentice-Hall, 1988. Google ScholarDigital Library
- C. A. Joslyn, S. M. Mniszewski, A. Fulmer, G. Heaton. The Gene Ontology Categorizer. In Bioinformatics, vol 20, pages i169--i177, 2004. Google ScholarDigital Library
- J. L. Sevilla, V.Segura, A. Podhorski, E. Guruceaga, J. Mato, L.A. Martinez-Cruz, F. J. Corrales, and A. Rubio. Correlation between gene expression and GO semantic Similarity. IEEE/ACM transactions on computational biology and bioinformatics, vol2, No4, 2005. Google ScholarDigital Library
- R. M. Karp Reducibility among combinatorial problems. Complexity of computer computations, Plenum Press, New York, pp.85--103, 1972.Google Scholar
- L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.Google Scholar
- G. Li, V. Uren, E. Motta, S. B. Shum, and J. Domingue, "Claimaker: Weaving a semantic web of research papers," in 1st International Semantic Web Conference, 2002. Google ScholarDigital Library
- P. W. Lord, R. Stevens, A. Brass, and C. A.Goble. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19(10):1275--83, 2003.Google ScholarCross Ref
- W. T. McCormick, P. J. Schweitzer, and T. W. White. Problem decomposition and data reorganization by a clustering technique. Operations Research, 20:993--1009, 1972.Google ScholarDigital Library
- P. T. Spellman, G. Sherlock, M. Q.Zhang, V. R. Lyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microaray hybidization. Molecular Biology of the Cell, 9:3273--2297, 1998.Google ScholarCross Ref
- S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho and G. Church. Systematic determination of genetic network architecture. Nature Genetics 22: 281--285, 1999.Google ScholarCross Ref
- H. Wang, F. Azuaje, O. Bodenreider. An ontology-driven clustering method for supporting gene expression analysis. Proceedings of the 18th IEEE International Symposium on Computer-Based Medical Systems, pp. 389--394. 2005. Google ScholarDigital Library
Index Terms
- Clustering pair-wise dissimilarity data into partially ordered sets
Recommendations
A dissimilarity measure based Fuzzy c-means FCM clustering algorithm
According to the definition of cluster objects belonging to same cluster must have high similarity while objects belonging to different clusters should be highly dissimilar. In the same way cluster validity indices for analyzing clustering result are ...
Clustering with Domain Value Dissimilarity for Categorical Data
ICDM '09: Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical AspectsClustering is a representative grouping process to find out hidden information and understand the characteristics of dataset to get a view of the further analysis. The concept of similarity and dissimilarity of objects is a fundamental decisive factor ...
A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition
Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Comments