Biclustering via structured regularized matrix decomposition

Zhong, Yan; Huang, Jianhua Z.

doi:10.1007/s11222-022-10095-1

Biclustering via structured regularized matrix decomposition

Published: 29 April 2022

Volume 32, article number 37, (2022)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

404 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Biclustering is a machine learning problem that deals with simultaneously clustering of rows and columns of a data matrix. Complex structures of the data matrix such as overlapping biclusters have challenged existing methods. In this paper, we first provide a unified formulation of biclustering that uses structured regularized matrix decomposition, which synthesizes various existing methods, and then develop a new biclustering method called BCEL based on this formulation. The biclustering problem is formulated as a penalized least-squares problem that approximates the data matrix \(\mathbf {X}\) by a multiplicative matrix decomposition \(\mathbf {U}\mathbf {V}^T\) with sparse columns in both \(\mathbf {U}\) and \(\mathbf {V}\). The squared \(\ell _{1,2}\)-norm penalty, also called the exclusive Lasso penalty, is applied to both \(\mathbf {U}\) and \(\mathbf {V}\) to assist identification of rows and columns included in the biclusters. The penalized least-squares problem is solved by a novel computational algorithm that combines alternating minimization and the proximal gradient method. A subsampling based procedure called stability selection is developed to select the tuning parameters and determine the bicluster membership. BCEL is shown to be competitive to existing methods in simulation studies and an application to a real-world single-cell RNA sequencing dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Biclustering Algorithms Based on Metaheuristics: A Review

A Binary Factor Graph Model for Biclustering

Dominant Set Biclustering

References

Asgarian, N., Greiner, R.: Using rank-1 biclusters to classify microarray data. Dept Computing Science, and the Alberta Ingenuity Center for Machine Learning, Univ Alberta, Edmonton, AB, Canada, T6G2E8 (2006)
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10(3–4), 373–384 (2003)
Article Google Scholar
Bergmann, S., Ihmels, J., Barkai, N.: Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E 67(3), 031902 (2003)
Article Google Scholar
Campbell, F., Allen, G.I., et al.: Within group variable selection through the exclusive lasso. Electron. J. Stat. 11(2), 4220–4257 (2017)
Article MathSciNet Google Scholar
Chen, K., Chan, K.S., Stenseth, N.C.: Reduced rank stochastic regression with a sparse singular value decomposition. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 203–221 (2012)
Article MathSciNet Google Scholar
Chi, E.C., Allen, G.I., Baraniuk, R.G.: Convex biclustering. Biometrics 73(1), 10–19 (2017)
Article MathSciNet Google Scholar
Corneli, M., Bouveyron, C., Latouche, P.: Co-clustering of ordinal data via latent continuous random variables and not missing at random entries. J. Comput. Graph. Stat. 29(4), 771–785 (2020)
Article MathSciNet Google Scholar
Gao, C., Lu, Y., Ma, Z., Zhou, H.H.: Optimal estimation and completion of matrices with biclustering structures. J. Mach. Learn. Res. 17(1), 5602–5630 (2016)
MathSciNet MATH Google Scholar
Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Article MathSciNet Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)
Article Google Scholar
Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W., et al.: Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)
Article Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5(Nov), 1457–1469 (2004)
MathSciNet MATH Google Scholar
Hunter, D.R., Lange, K.: A tutorial on mm algorithms. Am. Stat. 58(1), 30–37 (2004)
Article MathSciNet Google Scholar
Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015)
Article MathSciNet Google Scholar
Kong, D., Fujimaki, R., Liu, J., Nie, F., Ding, C.: Exclusive feature learning on arbitrary structures via \(\ell _{1,2}\)-norm. In: Advances in Neural Information Processing Systems, pp. 1655–1663 (2014)
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica Sinica 12, 61–86 (2002)
MathSciNet MATH Google Scholar
Lee, M., Shen, H., Huang, J.Z., Marron, J.: Biclustering via sparse singular value decomposition. Biometrics 66(4), 1087–1095 (2010)
Article MathSciNet Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003, World Scientific, pp. 77–88 (2002)
Padilha, V.A., Campello, R.J.: A systematic comparative evaluation of biclustering techniques. BMC Bioinform. 18(1), 1–25 (2017)
Article Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Article Google Scholar
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Article Google Scholar
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Qi, X., Luo, R., Zhao, H.: Sparse principal component analysis by choice of norm. J. Multivar. Anal. 114, 127–160 (2013)
Article MathSciNet Google Scholar
Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B., et al.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3(3), 985–1012 (2009)
Sill, M., Kaiser, S., Benner, A., Kopp-Schneider, A.: Robust biclustering by sparse singular value decomposition incorporating stability selection. Bioinformatics 27(15), 2089–2097 (2011)
Article Google Scholar
Tan, K.M., Witten, D.M.: Sparse biclustering of transposable data. J. Comput. Graph. Stat. 23(4), 985–1008 (2014)
Article MathSciNet Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl-1), S136–S144 (2002)
Article Google Scholar
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)
Article Google Scholar
Ximerakis, M., Lipnick, S.L., Innes, B.T., Simmons, S.K., Adiconis, X., Dionne, D., Mayweather, B.A., Nguyen, L., Niziolek, Z., Ozek, C., et al.: Single-cell transcriptomic profiling of the aging mouse brain. Nat. Neurosci. 22(10), 1696–1708 (2019)
Article Google Scholar
Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. IEEE, pp. 321–327 (2003)
Zaki, M.J., Meira, W., Jr., Meira, W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press (2014)
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37, 3468–3497 (2009)
Article MathSciNet Google Scholar
Zhou, Y., Jin, R., Hoi, S.C.H.: Exclusive lasso for multi-task feature selection. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 988–995 (2010)

Download references

Funding

Part of the work of Jianhua Z. Huang was done when he was with Texas A &M University and was partly supported by NSF Grants No. 1956219 and 1900990. Huang was also partly supported by funding from the Pengcheng Peacock Program of Shenzhen.

Author information

Authors and Affiliations

Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
Yan Zhong
School of Data Science, The Chinese University of Hong Kong, Shenzhen, Shenzhen, China
Jianhua Z. Huang

Authors

Yan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Z. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 255 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, Y., Huang, J.Z. Biclustering via structured regularized matrix decomposition. Stat Comput 32, 37 (2022). https://doi.org/10.1007/s11222-022-10095-1

Download citation

Received: 29 August 2021
Accepted: 04 April 2022
Published: 29 April 2022
DOI: https://doi.org/10.1007/s11222-022-10095-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Biclustering via structured regularized matrix decomposition

Abstract

Access this article

Similar content being viewed by others

Biclustering Algorithms Based on Metaheuristics: A Review

A Binary Factor Graph Model for Biclustering

Dominant Set Biclustering

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 255 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Biclustering via structured regularized matrix decomposition

Abstract

Access this article

Similar content being viewed by others

Biclustering Algorithms Based on Metaheuristics: A Review

A Binary Factor Graph Model for Biclustering

Dominant Set Biclustering

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 255 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation