Abstract
Biclustering is a data mining method that allows simultaneous clustering of two variables row and columns of a matrix. A bicluster typically corresponds to a sub-matrix that presents some coherent tendency. A traditional biclustering task for categorical variables is to determine heavy sub-graphs correspond to significant biclusters, i.e., biclusters with high co-occurrence values. Though algorithms have been proposed to extract sub-graphs biclusters, they present limited knowledge about the relevant importance of individual bicluster, as well as an importance of the variables for each bicluster. To address above problems, there have been several attempts to employ Bayesian method or mixture models using information theory. Although they can rank the biclusters and the variables for specific bicluster; they do not aim at extracting heavy sub-graphs biclusters. Moreover, these models force the search for biclusters in such a way that each cell in the matrix must engage in some bicluster. We attempt to mitigate these constraints employing dual topic models. In particular first, we propose a generalised Latent Dirichlet Allocation (LDA) topic model that obtains dual topics, i.e., topics in opposite directions: row and column topics. To achieve better topics, it applies joint reinforcement, i.e., considering column-topics while creating row-topics, and vice versa. Heavy sub-graphs biclusters, the high co-occurred association, are extracted using thresholds. We demonstrate that our proposed model Co-clustering for Dual Topic is useful for obtaining heavy sub-graphs biclusters by testing over a simulated data, a text corpus and a microarray gene expression data. The experimental results show that biclusters extracted by Co-clustering for Dual Topic model are better than traditional biclustering models.
Change history
29 January 2020
The Editors have retracted this conference paper [1] following an investigation by Victoria University of Wellington, for having significant overlap with a conference paper [2] by different authors. The latter [2] was submitted to a conference before the former [1]. Xiaoying Gao and Ian Welch agree to this retraction, Santosh Kumar does not agree to this retraction.
References
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)
Bao, J., Zheng, Y., Mokbel, M.F.: Location-based and preference-aware recommendation using sparse geo-social networking data. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 199–208. ACM (2012)
Bicego, M., Lovato, P., Ferrarini, A., Delledonne, M.: Biclustering of expression microarray data with topic models. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 2728–2731. IEEE (2010)
Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, pp. 147 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274. ACM (2001)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)
Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2013)
Falcon, S., Gentleman, R.: Using gostats to test gene lists for go term association. Bioinformatics 23(2), 257–258 (2007)
Globerson, A., Chechik, G., Pereira, F., Tishby, N.: Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8, 2265–2295 (2007)
Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Bioinformatics 52(6), 3233–3245 (2008)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Bioinformatics 101(suppl 1), 5228–5235 (2004)
Hartigan, J.A.: Direct clustering of a data matrix. Bioinformatics 67(337), 123–129 (1972)
Hitchcock, C.: Probabilistic causation. In: Stanford Encyclopedia of Philosophy (2010)
Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Bioinformatics 9(7), 811–818 (1990)
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica sinica 12, 61–86 (2002)
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. Bioinformatics 8, 77–88 (2003)
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. Bioinformatics 57, 163–180 (2015)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Shan, H., Banerjee, A.: Bayesian co-clustering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 530–539. IEEE (2008)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl 1), S136–S144 (2002)
Wang, P., Laskey, K.B., Domeniconi, C., Jordan, M.I.: Nonparametric Bayesian co-clustering ensembles. In: SDM, pp. 331–342. SIAM (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kumar, S., Gao, X., Welch, I. (2016). RETRACTED CHAPTER: Co-clustering for Dual Topic Models. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)