Article

Clustering pair-wise dissimilarity data into partially ordered sets

Authors:
Jinze Liu

University of North Carolina, Chapel Hill, NC

University of North Carolina, Chapel Hill, NC
View Profile

,
Qi Zhang

University of North Carolina, Chapel Hill, NC

University of North Carolina, Chapel Hill, NC
View Profile

,
Wei Wang

University of North Carolina, Chapel Hill, NC

University of North Carolina, Chapel Hill, NC
View Profile

,
Leonard McMillan

University of North Carolina, Chapel Hill, NC

University of North Carolina, Chapel Hill, NC
View Profile

,
Jan Prins

University of North Carolina, Chapel Hill, NC

University of North Carolina, Chapel Hill, NC
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 637–642https://doi.org/10.1145/1150402.1150480

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 637–642

ABSTRACT

Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilarity matrices. We demonstrate that classical clustering algorithms, which take dissimilarity matrices as inputs, do not incorporate all available information. In fact, only special types of dissimilarity matrices can be exactly preserved by previous clustering methods. We model ontologies as a partially ordered set (poset) over the subset relation. In this paper, we propose a new clustering algorithm, that generates a partially ordered set of clusters from a dissimilarity matrix.

References

M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, G. Sherlock: Gene Ontology: tool for the unification of biology. Nat Genet 2000, 25:25--29.Google Scholar
Applications of the pyramidal clustering method to biological objects. Comput Chem, 23(3-4):303--15, Jun 15, 1999.Google ScholarCross Ref
P. Berkhin. Survey of clustering data mining techniques https://umdrive.memphis.edu/vphan/public/berkhin-survey.pdf, Accrue Software, 2002.Google Scholar
P. Bertrand and M. F. Janowitz. Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, Volume 122, Issues 1-3, Pages 55--81, 15 October 2002. Google ScholarDigital Library
C. Bron and J. Kerbosch, Algorithm 457: Finding all cliques of an undirected graph, Commun. ACM, vol. 16, no. 9, pp. 575--577, 1973. Google ScholarDigital Library
Budanitsky, A., and G. Hirst, "Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures", Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA, June 2001.Google Scholar
E. Diday, Orders and overlapping clusters in pyramids. In: J. De Leeuw et al. Multidimensional Data Analysis, DSWO Press, Leiden (1986), pp. 201--234.Google Scholar
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley and Sons, Inc., 2001. Google ScholarDigital Library
L. K. Hua, Introduction to Number Theory. Springer-Verlag, New York, 1982.Google Scholar
A. JAIN and R. Dubes. Algorithms for clustering data. Prentice-Hall, 1988. Google ScholarDigital Library
C. A. Joslyn, S. M. Mniszewski, A. Fulmer, G. Heaton. The Gene Ontology Categorizer. In Bioinformatics, vol 20, pages i169--i177, 2004. Google ScholarDigital Library
J. L. Sevilla, V.Segura, A. Podhorski, E. Guruceaga, J. Mato, L.A. Martinez-Cruz, F. J. Corrales, and A. Rubio. Correlation between gene expression and GO semantic Similarity. IEEE/ACM transactions on computational biology and bioinformatics, vol2, No4, 2005. Google ScholarDigital Library
R. M. Karp Reducibility among combinatorial problems. Complexity of computer computations, Plenum Press, New York, pp.85--103, 1972.Google Scholar
L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.Google Scholar
G. Li, V. Uren, E. Motta, S. B. Shum, and J. Domingue, "Claimaker: Weaving a semantic web of research papers," in 1st International Semantic Web Conference, 2002. Google ScholarDigital Library
P. W. Lord, R. Stevens, A. Brass, and C. A.Goble. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19(10):1275--83, 2003.Google ScholarCross Ref
W. T. McCormick, P. J. Schweitzer, and T. W. White. Problem decomposition and data reorganization by a clustering technique. Operations Research, 20:993--1009, 1972.Google ScholarDigital Library
P. T. Spellman, G. Sherlock, M. Q.Zhang, V. R. Lyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microaray hybidization. Molecular Biology of the Cell, 9:3273--2297, 1998.Google ScholarCross Ref
S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho and G. Church. Systematic determination of genetic network architecture. Nature Genetics 22: 281--285, 1999.Google ScholarCross Ref
H. Wang, F. Azuaje, O. Bodenreider. An ontology-driven clustering method for supporting gene expression analysis. Proceedings of the 18th IEEE International Symposium on Computer-Based Medical Systems, pp. 389--394. 2005. Google ScholarDigital Library

Index Terms

Clustering pair-wise dissimilarity data into partially ordered sets
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A dissimilarity measure based Fuzzy c-means FCM clustering algorithm

According to the definition of cluster objects belonging to same cluster must have high similarity while objects belonging to different clusters should be highly dissimilar. In the same way cluster validity indices for analyzing clustering result are ...
Read More
Clustering with Domain Value Dissimilarity for Categorical Data
ICDM '09: Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects

Clustering is a representative grouping process to find out hidden information and understand the characteristics of dataset to get a view of the further analysis. The concept of similarity and dissimilarity of objects is a fundamental decisive factor ...
Read More
A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PoCluster
clustering
dissimilarity
poset
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 659
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clustering pair-wise dissimilarity data into partially ordered sets

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A dissimilarity measure based Fuzzy c-means FCM clustering algorithm

Clustering with Domain Value Dissimilarity for Categorical Data

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Clustering pair-wise dissimilarity data into partially ordered sets

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A dissimilarity measure based Fuzzy c-means FCM clustering algorithm

Clustering with Domain Value Dissimilarity for Categorical Data

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media