RETRACTED CHAPTER: Co-clustering for Dual Topic Models

Kumar, Santosh; Gao, Xiaoying; Welch, Ian

doi:10.1007/978-3-319-50127-7_34

Santosh Kumar²¹,
Xiaoying Gao²¹ &
Ian Welch²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3302 Accesses
1 Citations

The original version of this chapter was retracted: The retraction note to this chapter is available at https://doi.org/10.1007/978-3-319-50127-7_67

Abstract

Biclustering is a data mining method that allows simultaneous clustering of two variables row and columns of a matrix. A bicluster typically corresponds to a sub-matrix that presents some coherent tendency. A traditional biclustering task for categorical variables is to determine heavy sub-graphs correspond to significant biclusters, i.e., biclusters with high co-occurrence values. Though algorithms have been proposed to extract sub-graphs biclusters, they present limited knowledge about the relevant importance of individual bicluster, as well as an importance of the variables for each bicluster. To address above problems, there have been several attempts to employ Bayesian method or mixture models using information theory. Although they can rank the biclusters and the variables for specific bicluster; they do not aim at extracting heavy sub-graphs biclusters. Moreover, these models force the search for biclusters in such a way that each cell in the matrix must engage in some bicluster. We attempt to mitigate these constraints employing dual topic models. In particular first, we propose a generalised Latent Dirichlet Allocation (LDA) topic model that obtains dual topics, i.e., topics in opposite directions: row and column topics. To achieve better topics, it applies joint reinforcement, i.e., considering column-topics while creating row-topics, and vice versa. Heavy sub-graphs biclusters, the high co-occurred association, are extracted using thresholds. We demonstrate that our proposed model Co-clustering for Dual Topic is useful for obtaining heavy sub-graphs biclusters by testing over a simulated data, a text corpus and a microarray gene expression data. The experimental results show that biclusters extracted by Co-clustering for Dual Topic model are better than traditional biclustering models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Change history

29 January 2020
The Editors have retracted this conference paper [1] following an investigation by Victoria University of Wellington, for having significant overlap with a conference paper [2] by different authors. The latter [2] was submitted to a conference before the former [1]. Xiaoying Gao and Ian Welch agree to this retraction, Santosh Kumar does not agree to this retraction.

References

Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)
Book Google Scholar
Bao, J., Zheng, Y., Mokbel, M.F.: Location-based and preference-aware recommendation using sparse geo-social networking data. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 199–208. ACM (2012)
Google Scholar
Bicego, M., Lovato, P., Ferrarini, A., Delledonne, M.: Biclustering of expression microarray data with topic models. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 2728–2731. IEEE (2010)
Google Scholar
Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, pp. 147 (2006)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274. ACM (2001)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)
Google Scholar
Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2013)
Article Google Scholar
Falcon, S., Gentleman, R.: Using gostats to test gene lists for go term association. Bioinformatics 23(2), 257–258 (2007)
Article Google Scholar
Globerson, A., Chechik, G., Pereira, F., Tishby, N.: Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8, 2265–2295 (2007)
MathSciNet MATH Google Scholar
Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Bioinformatics 52(6), 3233–3245 (2008)
MathSciNet MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Bioinformatics 101(suppl 1), 5228–5235 (2004)
Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. Bioinformatics 67(337), 123–129 (1972)
Google Scholar
Hitchcock, C.: Probabilistic causation. In: Stanford Encyclopedia of Philosophy (2010)
Google Scholar
Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Bioinformatics 9(7), 811–818 (1990)
Google Scholar
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica sinica 12, 61–86 (2002)
MathSciNet MATH Google Scholar
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. Bioinformatics 8, 77–88 (2003)
MATH Google Scholar
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. Bioinformatics 57, 163–180 (2015)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Google Scholar
Shan, H., Banerjee, A.: Bayesian co-clustering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 530–539. IEEE (2008)
Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl 1), S136–S144 (2002)
Article Google Scholar
Wang, P., Laskey, K.B., Domeniconi, C., Jordan, M.I.: Nonparametric Bayesian co-clustering ensembles. In: SDM, pp. 331–342. SIAM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Santosh Kumar, Xiaoying Gao & Ian Welch

Authors

Santosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ian Welch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar .

Editor information

Editors and Affiliations

University of Tasmania, Hobart, Australia
Byeong Ho Kang
Auckland University of Technology, Auckland, New Zealand
Quan Bai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, S., Gao, X., Welch, I. (2016). RETRACTED CHAPTER: Co-clustering for Dual Topic Models. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-50127-7_34
Published: 29 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics