Finding Biclusters by Random Projections

Lonardi, Stefano; Szpankowski, Wojciech; Yang, Qiaofeng

doi:10.1007/978-3-540-27801-6_8

Stefano Lonardi¹⁸,
Wojciech Szpankowski¹⁹ &
Qiaofeng Yang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

643 Accesses
17 Citations

Abstract

Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1998). ACM SIGMOD Record, vol.27(2), pp. 94–105. ACM Press, New York (1998)
Chapter Google Scholar
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1999). SIGMOD Record, vol.28(2), pp. 61–72. ACM Press, New York (1999)
Chapter Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2000), pp. 93–103. AAAI Press, Menlo Park (2000)
Google Scholar
Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD 2002), pp. 394–405. ACM Press, New York (2002)
Chapter Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. In: Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), in Bioinformatics, vol.18,pp.S136–S144 (2002)
Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. In: Proceedings of Sixth International Conference on Computational Molecular Biology (RECOMB 2002), pp. 45–55. ACM Press, New York (2002)
Google Scholar
Zhang, L., Zhu, S.: A new clustering method for microarray data analysis. In: Proceedings of the First IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 268–275. IEEE Press (, Los Alamitos (2002)
Chapter Google Scholar
Sheng, Q., Moreau, Y., Moor, B.D.: Biclustering microarray data by Gibbs sampling. In: Proceedings of European Conference on Computational Biology (ECCB 2003) (2003) (to appear)
Google Scholar
Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Proc. of the ACM Conference on Computational Learning Theory (COLT 2003) (2003) (to appear)
Google Scholar
Procopiuc, M., Jones, M., Agarwal, P., Murali, T.M.: A Monte-Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 International Conference on Management of Data (SIGMOD 20 02), pp. 418–427 (2002)
Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67, 123–129 (1972)
Article Google Scholar
Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB 2003), pp. 77–88 (2003)
Google Scholar
Hochbaum, D.S.: Approximating clique and biclique problems. Journal of Algorithms 29, 174–200 (1998)
Article MATH MathSciNet Google Scholar
Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York (1979)
MATH Google Scholar
Grigni, M., Manne, F.: On the complexity of the generalized block distribution. In: Saad, Y., Yang, T., Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1996. LNCS, vol. 1117, pp. 319–326. Springer, Heidelberg (1996)
Chapter Google Scholar
Peeters, R.: The maximum-edge biclique problem is NP-complete. Technical Report 789, Tilberg University: Faculty of Economics and Business Adminstration (2000)
Google Scholar
Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S.: On bipartite and multipartite clique problems. Journal of Algorithms 41, 388–403 (2001)
Article MATH MathSciNet Google Scholar
Pasechnik, D.V.: Bipartite sandwiches. Technical report (1999), available at http://arXiv.org/abs/math.CO/9907109
Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. J. Comput. Bio. 7, 1–46 (2000)
Article Google Scholar
Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., Brown, P.: Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21 (2000)
Article Google Scholar
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica Sinica 12, 61–86 (2002)
MATH MathSciNet Google Scholar
Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13, 703–716 (2003)
Article Google Scholar
Yang, J., Wang, H., Wang, W., Yu, P.S.: Enhanced biclustering on gene expression data. In: IEEE Symposium on Bioinformatics and Bioengineering, (BIBE 2003) (2003) (to appear)
Google Scholar
Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2002), pp. 145–154. AAAI Press, Menlo Park (2002)
Google Scholar
Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003). ACM Press,Newyork (2001) (to appear)
Google Scholar
Gu, M., Zha, H., Ding, C., He, X., Simon, H.: Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. Technical Report CSE-01-007, Department of Computer Science and Engineering, Pennsylvania State University (2001)
Google Scholar
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph parititioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 269–274. ACM Press, New York (2001)
Chapter Google Scholar
Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. Wiley Interscience, Hoboken (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of California, Riverside, CA
Stefano Lonardi & Qiaofeng Yang
Dept. of Computer Sciences, Purdue University, West Lafayette, IN
Wojciech Szpankowski

Authors

Stefano Lonardi
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Szpankowski
View author publications
You can also search for this author in PubMed Google Scholar
Qiaofeng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Suleyman Cenk Sahinalp
Google Inc., 76 9th Av, 4th Fl., 10011, New York, NY
S. Muthukrishnan
Tom Sawyer Software, 94612, Oakland, CA, USA
Ugur Dogrusoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lonardi, S., Szpankowski, W., Yang, Q. (2004). Finding Biclusters by Random Projections. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-27801-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics