skip to main content
10.1145/1150402.1150480acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Clustering pair-wise dissimilarity data into partially ordered sets

Authors Info & Claims
Published:20 August 2006Publication History

ABSTRACT

Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilarity matrices. We demonstrate that classical clustering algorithms, which take dissimilarity matrices as inputs, do not incorporate all available information. In fact, only special types of dissimilarity matrices can be exactly preserved by previous clustering methods. We model ontologies as a partially ordered set (poset) over the subset relation. In this paper, we propose a new clustering algorithm, that generates a partially ordered set of clusters from a dissimilarity matrix.

References

  1. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, G. Sherlock: Gene Ontology: tool for the unification of biology. Nat Genet 2000, 25:25--29.Google ScholarGoogle Scholar
  2. Applications of the pyramidal clustering method to biological objects. Comput Chem, 23(3-4):303--15, Jun 15, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Berkhin. Survey of clustering data mining techniques https://umdrive.memphis.edu/vphan/public/berkhin-survey.pdf, Accrue Software, 2002.Google ScholarGoogle Scholar
  4. P. Bertrand and M. F. Janowitz. Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, Volume 122, Issues 1-3, Pages 55--81, 15 October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bron and J. Kerbosch, Algorithm 457: Finding all cliques of an undirected graph, Commun. ACM, vol. 16, no. 9, pp. 575--577, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Budanitsky, A., and G. Hirst, "Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures", Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA, June 2001.Google ScholarGoogle Scholar
  7. E. Diday, Orders and overlapping clusters in pyramids. In: J. De Leeuw et al. Multidimensional Data Analysis, DSWO Press, Leiden (1986), pp. 201--234.Google ScholarGoogle Scholar
  8. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley and Sons, Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. K. Hua, Introduction to Number Theory. Springer-Verlag, New York, 1982.Google ScholarGoogle Scholar
  10. A. JAIN and R. Dubes. Algorithms for clustering data. Prentice-Hall, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. A. Joslyn, S. M. Mniszewski, A. Fulmer, G. Heaton. The Gene Ontology Categorizer. In Bioinformatics, vol 20, pages i169--i177, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. L. Sevilla, V.Segura, A. Podhorski, E. Guruceaga, J. Mato, L.A. Martinez-Cruz, F. J. Corrales, and A. Rubio. Correlation between gene expression and GO semantic Similarity. IEEE/ACM transactions on computational biology and bioinformatics, vol2, No4, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. M. Karp Reducibility among combinatorial problems. Complexity of computer computations, Plenum Press, New York, pp.85--103, 1972.Google ScholarGoogle Scholar
  14. L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.Google ScholarGoogle Scholar
  15. G. Li, V. Uren, E. Motta, S. B. Shum, and J. Domingue, "Claimaker: Weaving a semantic web of research papers," in 1st International Semantic Web Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. W. Lord, R. Stevens, A. Brass, and C. A.Goble. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19(10):1275--83, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. W. T. McCormick, P. J. Schweitzer, and T. W. White. Problem decomposition and data reorganization by a clustering technique. Operations Research, 20:993--1009, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. T. Spellman, G. Sherlock, M. Q.Zhang, V. R. Lyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microaray hybidization. Molecular Biology of the Cell, 9:3273--2297, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho and G. Church. Systematic determination of genetic network architecture. Nature Genetics 22: 281--285, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  20. H. Wang, F. Azuaje, O. Bodenreider. An ontology-driven clustering method for supporting gene expression analysis. Proceedings of the 18th IEEE International Symposium on Computer-Based Medical Systems, pp. 389--394. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clustering pair-wise dissimilarity data into partially ordered sets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader