Skip to main content
Log in

Co-clustering with augmented matrix

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way co-clustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of co-clustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it considers the use of augmented matrix during clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Morgenstern: http://www.morgenstern.com.tw/users2/index.php/.

References

  1. Agarwal D, Merugu S (2007) Predictive discrete latent factor models for large scale dyadic data. In: KDD’07: proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Jose, pp 26–35

    Google Scholar 

  2. Mild A, Reutterer T (2001) Collaborative filtering methods for binary market basket data analysis. In: Lecture notes in computer science, pp 302–313

    Google Scholar 

  3. Banerjee A, Dhillon I-S, Ghosh J, Merugu S, Modha D-S (2004) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: KDD’04: proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, Seattle, pp 509–514

    Google Scholar 

  4. Blei D-M, Ng A-Y, Jordan M-I (2003) Latent Dirichlet allocation. J Mach Learn Res 993–1022. doi:10.1162/jmlr.2003.3.4-5.993

  5. Chen G, Wang F, Zhang C (2009) Collaborative filtering using orthogonal nonnegative matrix tri-factorization. In: Information processing and management, IPM, pp 368–379

    Google Scholar 

  6. Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

    Book  MATH  Google Scholar 

  7. Dai W, Xue G-R, Yang Q, Yu Y (2007) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, San Jose, California, USA, pp 210–219

    Chapter  Google Scholar 

  8. Dhillon I-S (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD’01: proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, pp 269–274

    Chapter  Google Scholar 

  9. Dhillon I-S, Mallela S, Modha D-S (2003) Information theoretic co-clustering. In: KDD’03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 89–98

    Chapter  Google Scholar 

  10. Ding C, He X, Simon H-D (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 5th SIAM international conference on data mining, Newport Beach, CA, USA, pp 606–610

    Google Scholar 

  11. Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorization for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, pp 126–135

    Chapter  Google Scholar 

  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1). doi:10.1145/1656274.1656278

  13. Konstas I, Stathopoulos V, Jose J-M (2009) On social networks and collaborative recommendation. In: Proceedings of the 32nd international ACM SIGIR conference on research and development, Boston, MA, USA, pp 195–202

    Chapter  Google Scholar 

  14. Li B, Yang Q, Xue X (2009) Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction. In: Proc of the 21st int’l joint conf on artificial intelligence (IJCAI 2009), pp 2052–2057

    Google Scholar 

  15. Long B, Zhang Z, Yu P-S (2005) Co-clustering by block value decomposition. In: KDD’05: proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM press, Chicago, pp 635–640

    Chapter  Google Scholar 

  16. Scott D-W (2009) Sturges’ rule. In: WIREs computational statistics, vol 1, pp 303–306

    Google Scholar 

  17. Shafiei M, Milios E (2005) Model-based overlapping co-clustering, supported by grants from the. Natural Sciences and Engineering Research Council of Canada, IT Interactive Services Inc, and GINIus Inc

  18. Shafiei M, Milios E (2006) Latent Dirichlet co-clustering. In: Data mining 2006. ICDM’06. Sixth international conference, Hong Kong, December 18–22, 2006, pp 542–551

    Google Scholar 

  19. Shan H, Banerjee A (2008) Bayesian co-clustering. In: Data mining 2008. ICDM’08. Eighth IEEE international conference, Pisa, December 15–19, 2008, pp 530–539

    Chapter  Google Scholar 

  20. Shi K, Li L (2012) High performance genetic algorithm based text clustering using parts of speech and outlier elimination. J Appl Intell. doi:10.1007/s10489-012-0382-8

  21. Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, Greece, pp 208–215

    Google Scholar 

  22. Sugiyama K, Hatano K, Yoshikawa M (2004) Adaptive web search based on user profile constructed without any effort from users. In: International world wide web conference proceedings of the 13th international conference on world wide web, New York, NY, USA, pp 675–684

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Hui Chang.

Additional information

This paper is partially supported by National Science Council, Taiwan under grant NSC-100-2628-E-8-012-MY3.

Appendices

Appendix A: Proof of Lemma 1

For a fixed co-clustering \((\hat{A}, \hat{U})\), we can re-write the loss in mutual information as K-L divergence or relative entropy measure as

(29)

Proof

By definition,

where in the second step, \(g(a \mid\hat{a})=\frac{g(a)}{g(\hat{a})}\) solidifies the followed equality.

where in the second step, \(h(u \mid\hat{u})=\frac{h(u)}{h(\hat{u})}\) solidifies the followed equality. □

Appendix B: Proof of Lemma 2

Proof

Since Eq. (18) is proved in [9], we focus on the rest two equations.

(30)

Similarly, we could use the same argument to prove Eq. (20).

(31)

 □

Appendix C: Proof of Theorem 2

The CCAM algorithm could monotonically decreases the objective function Eq. (6). Since

$$ q^{(t)} (\hat{A} , \hat{U}) \geq q^{(t+1)} (\hat{A} , \hat{U}) $$
(32)

Proof

Let \(\varPhi=\varphi\cdot D(h(U, L) \,||\,\hat{h}^{(t+1)}(U, L))\). For t=1,3,…,2T+1.

The inequality follows from step 1 since \(C_{A}^{t+1}(a)\) is selected to minimize the objective function.

By using an identical argument, we can prove Eq. (32) for t=2,4,…,2T+2. Let \(\varLambda=\lambda\cdot D(f(A, S) \,||\,\hat {f}^{(t+1)}(A, S))\).

 □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, ML., Chang, CH. & Liu, RZ. Co-clustering with augmented matrix. Appl Intell 39, 153–164 (2013). https://doi.org/10.1007/s10489-012-0401-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0401-9

Keywords

Navigation