ABSTRACT
Relational data appear frequently in many machine learning applications. Relational data consist of the pairwise relations (similarities or dissimilarities) between each pair of implicit objects, and are usually stored in relation matrices and typically no other knowledge is available. Although relational clustering can be formulated as graph partitioning in some applications, this formulation is not adequate for general relational data. In this paper, we propose a general model for relational clustering based on symmetric convex coding. The model is applicable to all types of relational data and unifies the existing graph partitioning formulation. Under this model, we derive two alternative bound optimization algorithms to solve the symmetric convex coding under two popular distance functions, Euclidean distance and generalized I-divergence. Experimental evaluation and theoretical analysis show the effectiveness and great potential of the proposed model and algorithms.
- Banerjee, A., Dhillon, I. S., Ghosh, J., Merugu, S., & Modha, D. S. (2004). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. KDD (pp. 509--514). Google ScholarDigital Library
- Bui, T. N., & Jones, C. (1993). A heuristic for reducing fill-in in sparse matrix factorization. PPSC (pp. 445--452).Google Scholar
- Catral, M., Han, L., Neumann, M., & Plemmons, R. (2004). On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Algebra and Its Application.Google Scholar
- Chan, P. K., Schlag, M. D. F., & Zien, J. Y. (1993). Spectral k-way ratio-cut partitioning and clustering. DAC '93 (pp. 749--754). Google ScholarDigital Library
- D. D. Lee, & H. S. Seung (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788--791.Google ScholarCross Ref
- Dhillon, I., Guan, Y., & Kulis, B. (2004). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.Google Scholar
- Dhillon, I., Guan, Y., & Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. KDD '05. Google ScholarDigital Library
- Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. KDD (pp. 269--274). Google ScholarDigital Library
- Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. KDD'03 (pp. 89--98). Google ScholarDigital Library
- Ding, C., He, X., & Simon, H. (2005). On the equivalence of non-negative matrix factorization and spectral clustering. SDM'05.Google ScholarCross Ref
- Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal non-negative matrix tri-factorizations for clustering. kdd'06. Google ScholarDigital Library
- Ding, C. H. Q., He, X., Zha, H., Gu, M., & Simon, H. D. (2001). A min-max cut algorithm for graph partitioning and data clustering. Proceedings of ICDM 2001 (pp. 107--114). Google ScholarDigital Library
- Hendrickson, B., & Leland, R. (1995). A multilevel algorithm for partitioning graphs. Supercomputing '95 (p. 28). Google ScholarDigital Library
- Henzinger, M., Motwani, R., & Silverstein, C. (2003). Challenges in web search engines. Proc. of the 18th International Joint Conference on Artificial Intelligence (pp. 1573--1579). Google ScholarDigital Library
- Karypis, G. (2002). A clustering toolkit.Google Scholar
- Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, 359--392. Google ScholarDigital Library
- Kernighan, B., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49, 291--307.Google ScholarCross Ref
- Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1999). Trawling the Web for emerging cyber-communities. Computer Networks (Amsterdam, Netherlands: 1999), 31, 1481--1493. Google ScholarDigital Library
- Lang, K. (1995). News weeder: Learning to filter netnews. ICML.Google Scholar
- Li, T. (2005). A general model for clustering binary data. KDD'05. Google ScholarDigital Library
- Long, B., Zhang, Z. M., & Yu, P. S. (2005). Co-clustering by block value decomposition. KDD'05. Google ScholarDigital Library
- Nasraoui, O., Krishnapuram, R., & Joshi, A. (1999). Relational clustering based on a new robust estimator with application to web mining. NAFIPS 99.Google Scholar
- Salakhutdinov, R., & Roweis, S. (2003). Adaptive overrelaxed bound optimization methods. ICML'03.Google Scholar
- Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888--905. Google ScholarDigital Library
- Strehl, A., & Ghosh, J. (2002). Cluster ensembles -- a knowledge reuse framework for combining partitionings. AAAI 2002 (pp. 93--98). Google ScholarDigital Library
- Yu, K., Yu, S., & Tresp, V. (2005). Soft clustering on graphs. NIPS'05.Google Scholar
- Yu, S., & Shi, J. (2003). Multiclass spectral clustering. ICCV'03. Google ScholarDigital Library
- Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2001). Bi-partite graph partitioning and data clustering. ACM CIKM'01. Google ScholarDigital Library
- Relational clustering by symmetric convex coding
Recommendations
Multi-relational Clustering Based on Relational Distance
WISA '15: Proceedings of the 2015 12th Web Information System and Application Conference (WISA)When clustering the tuples in the target table which is in a relational database, the prior task is to exactly and effectively calculate the relational distance between tuples. A lot of methods are used today, such as the relational distance measuring ...
Integrating K-Means Clustering with a Relational DBMS Using SQL
Integrating data mining algorithms with a relational DBMS is an important problem for database programmers. We introduce three SQL implementations of the popular K-means clustering algorithm to integrate it with a relational DBMS: 1) a straightforward ...
Comments