Skip to main content

RETRACTED CHAPTER: Co-clustering for Dual Topic Models

  • Conference paper
  • First Online:
AI 2016: Advances in Artificial Intelligence (AI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Abstract

Biclustering is a data mining method that allows simultaneous clustering of two variables row and columns of a matrix. A bicluster typically corresponds to a sub-matrix that presents some coherent tendency. A traditional biclustering task for categorical variables is to determine heavy sub-graphs correspond to significant biclusters, i.e., biclusters with high co-occurrence values. Though algorithms have been proposed to extract sub-graphs biclusters, they present limited knowledge about the relevant importance of individual bicluster, as well as an importance of the variables for each bicluster. To address above problems, there have been several attempts to employ Bayesian method or mixture models using information theory. Although they can rank the biclusters and the variables for specific bicluster; they do not aim at extracting heavy sub-graphs biclusters. Moreover, these models force the search for biclusters in such a way that each cell in the matrix must engage in some bicluster. We attempt to mitigate these constraints employing dual topic models. In particular first, we propose a generalised Latent Dirichlet Allocation (LDA) topic model that obtains dual topics, i.e., topics in opposite directions: row and column topics. To achieve better topics, it applies joint reinforcement, i.e., considering column-topics while creating row-topics, and vice versa. Heavy sub-graphs biclusters, the high co-occurred association, are extracted using thresholds. We demonstrate that our proposed model Co-clustering for Dual Topic is useful for obtaining heavy sub-graphs biclusters by testing over a simulated data, a text corpus and a microarray gene expression data. The experimental results show that biclusters extracted by Co-clustering for Dual Topic model are better than traditional biclustering models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Change history

  • 29 January 2020

    The Editors have retracted this conference paper [1] following an investigation by Victoria University of Wellington, for having significant overlap with a conference paper [2] by different authors. The latter [2] was submitted to a conference before the former [1]. Xiaoying Gao and Ian Welch agree to this retraction, Santosh Kumar does not agree to this retraction.

References

  1. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

  2. Bao, J., Zheng, Y., Mokbel, M.F.: Location-based and preference-aware recommendation using sparse geo-social networking data. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 199–208. ACM (2012)

    Google Scholar 

  3. Bicego, M., Lovato, P., Ferrarini, A., Delledonne, M.: Biclustering of expression microarray data with topic models. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 2728–2731. IEEE (2010)

    Google Scholar 

  4. Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, pp. 147 (2006)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274. ACM (2001)

    Google Scholar 

  7. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)

    Google Scholar 

  8. Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2013)

    Article  Google Scholar 

  9. Falcon, S., Gentleman, R.: Using gostats to test gene lists for go term association. Bioinformatics 23(2), 257–258 (2007)

    Article  Google Scholar 

  10. Globerson, A., Chechik, G., Pereira, F., Tishby, N.: Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8, 2265–2295 (2007)

    MathSciNet  MATH  Google Scholar 

  11. Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Bioinformatics 52(6), 3233–3245 (2008)

    MathSciNet  MATH  Google Scholar 

  12. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Bioinformatics 101(suppl 1), 5228–5235 (2004)

    Google Scholar 

  13. Hartigan, J.A.: Direct clustering of a data matrix. Bioinformatics 67(337), 123–129 (1972)

    Google Scholar 

  14. Hitchcock, C.: Probabilistic causation. In: Stanford Encyclopedia of Philosophy (2010)

    Google Scholar 

  15. Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Bioinformatics 9(7), 811–818 (1990)

    Google Scholar 

  16. Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica sinica 12, 61–86 (2002)

    MathSciNet  MATH  Google Scholar 

  17. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. Bioinformatics 8, 77–88 (2003)

    MATH  Google Scholar 

  18. Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. Bioinformatics 57, 163–180 (2015)

    Google Scholar 

  19. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)

    Google Scholar 

  20. Shan, H., Banerjee, A.: Bayesian co-clustering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 530–539. IEEE (2008)

    Google Scholar 

  21. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl 1), S136–S144 (2002)

    Article  Google Scholar 

  22. Wang, P., Laskey, K.B., Domeniconi, C., Jordan, M.I.: Nonparametric Bayesian co-clustering ensembles. In: SDM, pp. 331–342. SIAM (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kumar, S., Gao, X., Welch, I. (2016). RETRACTED CHAPTER: Co-clustering for Dual Topic Models. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics