Abstract
Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way co-clustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of co-clustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it considers the use of augmented matrix during clustering.
Similar content being viewed by others
Notes
Morgenstern: http://www.morgenstern.com.tw/users2/index.php/.
References
Agarwal D, Merugu S (2007) Predictive discrete latent factor models for large scale dyadic data. In: KDD’07: proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Jose, pp 26–35
Mild A, Reutterer T (2001) Collaborative filtering methods for binary market basket data analysis. In: Lecture notes in computer science, pp 302–313
Banerjee A, Dhillon I-S, Ghosh J, Merugu S, Modha D-S (2004) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: KDD’04: proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, Seattle, pp 509–514
Blei D-M, Ng A-Y, Jordan M-I (2003) Latent Dirichlet allocation. J Mach Learn Res 993–1022. doi:10.1162/jmlr.2003.3.4-5.993
Chen G, Wang F, Zhang C (2009) Collaborative filtering using orthogonal nonnegative matrix tri-factorization. In: Information processing and management, IPM, pp 368–379
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Dai W, Xue G-R, Yang Q, Yu Y (2007) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, San Jose, California, USA, pp 210–219
Dhillon I-S (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD’01: proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, pp 269–274
Dhillon I-S, Mallela S, Modha D-S (2003) Information theoretic co-clustering. In: KDD’03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 89–98
Ding C, He X, Simon H-D (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 5th SIAM international conference on data mining, Newport Beach, CA, USA, pp 606–610
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorization for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, pp 126–135
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1). doi:10.1145/1656274.1656278
Konstas I, Stathopoulos V, Jose J-M (2009) On social networks and collaborative recommendation. In: Proceedings of the 32nd international ACM SIGIR conference on research and development, Boston, MA, USA, pp 195–202
Li B, Yang Q, Xue X (2009) Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction. In: Proc of the 21st int’l joint conf on artificial intelligence (IJCAI 2009), pp 2052–2057
Long B, Zhang Z, Yu P-S (2005) Co-clustering by block value decomposition. In: KDD’05: proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM press, Chicago, pp 635–640
Scott D-W (2009) Sturges’ rule. In: WIREs computational statistics, vol 1, pp 303–306
Shafiei M, Milios E (2005) Model-based overlapping co-clustering, supported by grants from the. Natural Sciences and Engineering Research Council of Canada, IT Interactive Services Inc, and GINIus Inc
Shafiei M, Milios E (2006) Latent Dirichlet co-clustering. In: Data mining 2006. ICDM’06. Sixth international conference, Hong Kong, December 18–22, 2006, pp 542–551
Shan H, Banerjee A (2008) Bayesian co-clustering. In: Data mining 2008. ICDM’08. Eighth IEEE international conference, Pisa, December 15–19, 2008, pp 530–539
Shi K, Li L (2012) High performance genetic algorithm based text clustering using parts of speech and outlier elimination. J Appl Intell. doi:10.1007/s10489-012-0382-8
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, Greece, pp 208–215
Sugiyama K, Hatano K, Yoshikawa M (2004) Adaptive web search based on user profile constructed without any effort from users. In: International world wide web conference proceedings of the 13th international conference on world wide web, New York, NY, USA, pp 675–684
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is partially supported by National Science Council, Taiwan under grant NSC-100-2628-E-8-012-MY3.
Appendices
Appendix A: Proof of Lemma 1
For a fixed co-clustering \((\hat{A}, \hat{U})\), we can re-write the loss in mutual information as K-L divergence or relative entropy measure as
Proof
By definition,
where in the second step, \(g(a \mid\hat{a})=\frac{g(a)}{g(\hat{a})}\) solidifies the followed equality.
where in the second step, \(h(u \mid\hat{u})=\frac{h(u)}{h(\hat{u})}\) solidifies the followed equality. □
Appendix B: Proof of Lemma 2
Proof
Since Eq. (18) is proved in [9], we focus on the rest two equations.
Similarly, we could use the same argument to prove Eq. (20).
□
Appendix C: Proof of Theorem 2
The CCAM algorithm could monotonically decreases the objective function Eq. (6). Since
Proof
Let \(\varPhi=\varphi\cdot D(h(U, L) \,||\,\hat{h}^{(t+1)}(U, L))\). For t=1,3,…,2T+1.
The inequality follows from step 1 since \(C_{A}^{t+1}(a)\) is selected to minimize the objective function.
By using an identical argument, we can prove Eq. (32) for t=2,4,…,2T+2. Let \(\varLambda=\lambda\cdot D(f(A, S) \,||\,\hat {f}^{(t+1)}(A, S))\).
□
Rights and permissions
About this article
Cite this article
Wu, ML., Chang, CH. & Liu, RZ. Co-clustering with augmented matrix. Appl Intell 39, 153–164 (2013). https://doi.org/10.1007/s10489-012-0401-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0401-9