Article

Relational clustering by symmetric convex coding

Authors:
Bo Long

Suny Binghamton, Binghamton, NY

Suny Binghamton, Binghamton, NY
View Profile

,
Zhongfei (Mark) Zhang

Suny Binghamton, Binghamton, NY

Suny Binghamton, Binghamton, NY
View Profile

,
Xiaoyun Wu

Google Inc, Mountain View, CA

Google Inc, Mountain View, CA
View Profile

,
Philip S. Yu

IBM Watson Research Center, Hawthorne, NY

IBM Watson Research Center, Hawthorne, NY
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 569–576https://doi.org/10.1145/1273496.1273568

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 569–576

ABSTRACT

Relational data appear frequently in many machine learning applications. Relational data consist of the pairwise relations (similarities or dissimilarities) between each pair of implicit objects, and are usually stored in relation matrices and typically no other knowledge is available. Although relational clustering can be formulated as graph partitioning in some applications, this formulation is not adequate for general relational data. In this paper, we propose a general model for relational clustering based on symmetric convex coding. The model is applicable to all types of relational data and unifies the existing graph partitioning formulation. Under this model, we derive two alternative bound optimization algorithms to solve the symmetric convex coding under two popular distance functions, Euclidean distance and generalized I-divergence. Experimental evaluation and theoretical analysis show the effectiveness and great potential of the proposed model and algorithms.

References

Banerjee, A., Dhillon, I. S., Ghosh, J., Merugu, S., & Modha, D. S. (2004). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. KDD (pp. 509--514). Google ScholarDigital Library
Bui, T. N., & Jones, C. (1993). A heuristic for reducing fill-in in sparse matrix factorization. PPSC (pp. 445--452).Google Scholar
Catral, M., Han, L., Neumann, M., & Plemmons, R. (2004). On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Algebra and Its Application.Google Scholar
Chan, P. K., Schlag, M. D. F., & Zien, J. Y. (1993). Spectral k-way ratio-cut partitioning and clustering. DAC '93 (pp. 749--754). Google ScholarDigital Library
D. D. Lee, & H. S. Seung (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788--791.Google ScholarCross Ref
Dhillon, I., Guan, Y., & Kulis, B. (2004). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.Google Scholar
Dhillon, I., Guan, Y., & Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. KDD '05. Google ScholarDigital Library
Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. KDD (pp. 269--274). Google ScholarDigital Library
Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. KDD'03 (pp. 89--98). Google ScholarDigital Library
Ding, C., He, X., & Simon, H. (2005). On the equivalence of non-negative matrix factorization and spectral clustering. SDM'05.Google ScholarCross Ref
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal non-negative matrix tri-factorizations for clustering. kdd'06. Google ScholarDigital Library
Ding, C. H. Q., He, X., Zha, H., Gu, M., & Simon, H. D. (2001). A min-max cut algorithm for graph partitioning and data clustering. Proceedings of ICDM 2001 (pp. 107--114). Google ScholarDigital Library
Hendrickson, B., & Leland, R. (1995). A multilevel algorithm for partitioning graphs. Supercomputing '95 (p. 28). Google ScholarDigital Library
Henzinger, M., Motwani, R., & Silverstein, C. (2003). Challenges in web search engines. Proc. of the 18th International Joint Conference on Artificial Intelligence (pp. 1573--1579). Google ScholarDigital Library
Karypis, G. (2002). A clustering toolkit.Google Scholar
Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, 359--392. Google ScholarDigital Library
Kernighan, B., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49, 291--307.Google ScholarCross Ref
Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1999). Trawling the Web for emerging cyber-communities. Computer Networks (Amsterdam, Netherlands: 1999), 31, 1481--1493. Google ScholarDigital Library
Lang, K. (1995). News weeder: Learning to filter netnews. ICML.Google Scholar
Li, T. (2005). A general model for clustering binary data. KDD'05. Google ScholarDigital Library
Long, B., Zhang, Z. M., & Yu, P. S. (2005). Co-clustering by block value decomposition. KDD'05. Google ScholarDigital Library
Nasraoui, O., Krishnapuram, R., & Joshi, A. (1999). Relational clustering based on a new robust estimator with application to web mining. NAFIPS 99.Google Scholar
Salakhutdinov, R., & Roweis, S. (2003). Adaptive overrelaxed bound optimization methods. ICML'03.Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888--905. Google ScholarDigital Library
Strehl, A., & Ghosh, J. (2002). Cluster ensembles -- a knowledge reuse framework for combining partitionings. AAAI 2002 (pp. 93--98). Google ScholarDigital Library
Yu, K., Yu, S., & Tresp, V. (2005). Soft clustering on graphs. NIPS'05.Google Scholar
Yu, S., & Shi, J. (2003). Multiclass spectral clustering. ICCV'03. Google ScholarDigital Library
Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2001). Bi-partite graph partitioning and data clustering. ACM CIKM'01. Google ScholarDigital Library

Relational clustering by symmetric convex coding
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning

Recommendations

Relational Data Clustering: Models, Algorithms, and Applications
Read More
Multi-relational Clustering Based on Relational Distance
WISA '15: Proceedings of the 2015 12th Web Information System and Application Conference (WISA)

When clustering the tuples in the target table which is in a relational database, the prior task is to exactly and effectively calculate the relational distance between tuples. A lot of methods are used today, such as the relational distance measuring ...
Read More
Integrating K-Means Clustering with a Relational DBMS Using SQL

Integrating data mining algorithms with a relational DBMS is an important problem for database programmers. We introduce three SQL implementations of the popular K-means clustering algorithm to integrate it with a relational DBMS: 1) a straightforward ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 401
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Relational clustering by symmetric convex coding

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Relational Data Clustering: Models, Algorithms, and Applications

Multi-relational Clustering Based on Relational Distance

Integrating K-Means Clustering with a Relational DBMS Using SQL

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Relational clustering by symmetric convex coding

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Relational Data Clustering: Models, Algorithms, and Applications

Multi-relational Clustering Based on Relational Distance

Integrating K-Means Clustering with a Relational DBMS Using SQL

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media