Skip to main content

Finding Biclusters by Random Projections

  • Conference paper
Combinatorial Pattern Matching (CPM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Abstract

Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1998). ACM SIGMOD Record, vol.27(2), pp. 94–105. ACM Press, New York (1998)

    Chapter  Google Scholar 

  2. Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1999). SIGMOD Record, vol.28(2), pp. 61–72. ACM Press, New York (1999)

    Chapter  Google Scholar 

  3. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2000), pp. 93–103. AAAI Press, Menlo Park (2000)

    Google Scholar 

  4. Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD 2002), pp. 394–405. ACM Press, New York (2002)

    Chapter  Google Scholar 

  5. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. In: Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), in Bioinformatics, vol.18,pp.S136–S144 (2002)

    Google Scholar 

  6. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. In: Proceedings of Sixth International Conference on Computational Molecular Biology (RECOMB 2002), pp. 45–55. ACM Press, New York (2002)

    Google Scholar 

  7. Zhang, L., Zhu, S.: A new clustering method for microarray data analysis. In: Proceedings of the First IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 268–275. IEEE Press (, Los Alamitos (2002)

    Chapter  Google Scholar 

  8. Sheng, Q., Moreau, Y., Moor, B.D.: Biclustering microarray data by Gibbs sampling. In: Proceedings of European Conference on Computational Biology (ECCB 2003) (2003) (to appear)

    Google Scholar 

  9. Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Proc. of the ACM Conference on Computational Learning Theory (COLT 2003) (2003) (to appear)

    Google Scholar 

  10. Procopiuc, M., Jones, M., Agarwal, P., Murali, T.M.: A Monte-Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 International Conference on Management of Data (SIGMOD 20 02), pp. 418–427 (2002)

    Google Scholar 

  11. Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67, 123–129 (1972)

    Article  Google Scholar 

  12. Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB 2003), pp. 77–88 (2003)

    Google Scholar 

  13. Hochbaum, D.S.: Approximating clique and biclique problems. Journal of Algorithms 29, 174–200 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  14. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York (1979)

    MATH  Google Scholar 

  15. Grigni, M., Manne, F.: On the complexity of the generalized block distribution. In: Saad, Y., Yang, T., Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1996. LNCS, vol. 1117, pp. 319–326. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  16. Peeters, R.: The maximum-edge biclique problem is NP-complete. Technical Report 789, Tilberg University: Faculty of Economics and Business Adminstration (2000)

    Google Scholar 

  17. Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S.: On bipartite and multipartite clique problems. Journal of Algorithms 41, 388–403 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  18. Pasechnik, D.V.: Bipartite sandwiches. Technical report (1999), available at http://arXiv.org/abs/math.CO/9907109

  19. Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. J. Comput. Bio. 7, 1–46 (2000)

    Article  Google Scholar 

  20. Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., Brown, P.: Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21 (2000)

    Article  Google Scholar 

  21. Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica Sinica 12, 61–86 (2002)

    MATH  MathSciNet  Google Scholar 

  22. Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13, 703–716 (2003)

    Article  Google Scholar 

  23. Yang, J., Wang, H., Wang, W., Yu, P.S.: Enhanced biclustering on gene expression data. In: IEEE Symposium on Bioinformatics and Bioengineering, (BIBE 2003) (2003) (to appear)

    Google Scholar 

  24. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2002), pp. 145–154. AAAI Press, Menlo Park (2002)

    Google Scholar 

  25. Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003). ACM Press,Newyork (2001) (to appear)

    Google Scholar 

  26. Gu, M., Zha, H., Ding, C., He, X., Simon, H.: Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. Technical Report CSE-01-007, Department of Computer Science and Engineering, Pennsylvania State University (2001)

    Google Scholar 

  27. Dhillon, I.: Co-clustering documents and words using bipartite spectral graph parititioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 269–274. ACM Press, New York (2001)

    Chapter  Google Scholar 

  28. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. Wiley Interscience, Hoboken (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lonardi, S., Szpankowski, W., Yang, Q. (2004). Finding Biclusters by Random Projections. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics