Abstract
Recent advancements in technology have allowed for the collection of large amounts of data from multiple sources for individuals, leading to an increase in interest in developing statistical methods for analyzing the multi-source data. One of the main interests is to identify the structural association of multiple sources on multiple correlated responses, including individual, joint, and partially-joint structures. In this work, we propose a novel integrative sparse reduced-rank regression (iSRRR) model for identifying the structural associations between multi-source data and multiple responses. The model is based on the assumption of a structured decomposition of the coefficient matrix, and utilizes a new constraint based on orthogonal rotation to ensure model identifiability. The constraint imposes a specific structure, quartimax-simple, on the loading matrix, which enhances interpretability when identifying the multi-source structures relevant to specific responses. An iterative algorithm for estimating the iSRRR model parameters is also proposed. Simulation studies have demonstrated the ability of the proposed method to identify the underlying structured associations between multi-source data and multiple responses. The method has been applied to multi-omics dataset with multiple drug responses, and has been shown to be capable of detecting structured association patterns.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akiyama, M.: Multi-omics study for interpretation of genome-wide association study. J. Hum. Genet. 66(1), 3–10 (2021). https://doi.org/10.1038/s10038-020-00842-5
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009). https://doi.org/10.1214/08-AOS620
Bing, X., Wegkamp, M.H.: Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models. Ann. Stat. 47(6), 3157–3184 (2019). https://doi.org/10.1214/18-AOS1774
Bunea, F., She, Y., Wegkamp, M.H.: Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Stat. 39(2), 1282–1309 (2011). https://doi.org/10.1214/11-AOS876
Bunea, F., She, Y., Wegkamp, M.H.: Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Stat. 40(5), 2359–2388 (2012). https://doi.org/10.1214/12-AOS1039
Chen, K., Chan, K.S., Stenseth, N.C.: Reduced rank stochastic regression with a sparse singular value decomposition: reduced rank stochastic regression. J. R. Stat. Soc. 74(2), 203–221 (2012). https://doi.org/10.1111/j.1467-9868.2011.01002.x
Chen, L., Huang, J.Z.: Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Am. Stat. Assoc. 107(500), 1533–1545 (2012). https://doi.org/10.1080/01621459.2012.734178
Crawford, C.B., Ferguson, G.A.: A general rotation criterion and its use in orthogonal rotation. Psychometrika 35(3), 321–332 (1970). https://doi.org/10.1007/BF02310792
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. 70(5), 849–911 (2008). https://doi.org/10.1111/j.1467-9868.2008.00674.x
Gaynanova, I., Li, G.: Structural learning and integrative decomposition of multi-view data. Biometrics 75(4), 1121–1132 (2019). https://doi.org/10.1111/biom.13108
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, Oxford (2004)
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975). https://doi.org/10.1016/0047-259X(75)90042-1
Jennrich, R.I.: A simple general procedure for orthogonal rotation. Psychometrika 66(2), 289–306 (2001). https://doi.org/10.1007/BF02294840
Li, G., Liu, X., Chen, K.: Integrative multi-view regression: bridging group-sparse and low-rank models. Biometrics 75(2), 593–602 (2019). https://doi.org/10.1111/biom.13006
Lock, E.F., Hoadley, K.A., Marron, J.S., Nobel, A.B.: Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7(1), 523–542 (2013). https://doi.org/10.1214/12-AOAS597
Luna, A., Rajapakse, V.N., Sousa, F.G., Gao, J., Schultz, N., Varma, S., Reinhold, W., Sander, C., Pommier, Y.: rcellminer: exploring molecular profiles and drug response of the nci-60 cell lines in R. Bioinformatics 32(8), 1272–1274 (2016). https://doi.org/10.1093/bioinformatics/btv701
Mishra, A., Dey, D.K., Chen, K.: Sequential co-sparse factor regression. J. Comput. Graph. Stat. 26(4), 814–825 (2017). https://doi.org/10.1080/10618600.2017.1340891
Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012). https://doi.org/10.1214/12-STS400
Palzer, E.F., Wendt, C.H., Bowler, R.P., Hersh, C.P., Safo, S.E., Lock, E.F.: SJIVE: supervised joint and individual variation explained. Comput. Stat. Data Anal. 175, 107547 (2022). https://doi.org/10.1016/j.csda.2022.107547
Rajapakse, V.N., Luna, A., Yamade, M., Loman, L., Varma, S., Sunshine, M., Iorio, F., Sousa, F.G., Elloumi, F., Aladjem, M.I., Thomas, A., Sander, C., Kohn, K.W., Benes, C.H., Garnett, M., Reinhold, W.C., Pommier, Y.: Cell MinerCDB for integrative cross-database genomics and pharmacogenomics analyses of cancer cell lines. Iscience 10, 247–264 (2018). https://doi.org/10.1016/j.isci.2018.11.029
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013). https://doi.org/10.1080/10618600.2012.681250
Uematsu, Y., Fan, Y., Chen, K., Lv, J., Lin, W.: SOFAR: large-scale association network learning. IEEE Trans. Inf. Theory 65(8), 4924–4939 (2019). https://doi.org/10.1109/TIT.2019.2909889
Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 25(6), 1129–1141 (2015). https://doi.org/10.1007/s11222-014-9498-5
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68(1), 49–67 (2006). https://doi.org/10.1111/j.1467-9868.2005.00532.x
Zou, C., Ke, Y., Zhang, W.: Estimation of low rank high-dimensional multivariate linear models for multi-response data. J. Am. Stat. Assoc. 117(538), 693–703 (2022). https://doi.org/10.1080/01621459.2020.1799813
Acknowledgements
This work was supported by Samsung Science and Technology Foundation under Project Number SSTF-BA2002-03.
Author information
Authors and Affiliations
Contributions
KK wrote the main manuscript text and prepared all figures and tables, and SJ reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, K., Jung, S. Integrative sparse reduced-rank regression via orthogonal rotation for analysis of high-dimensional multi-source data. Stat Comput 34, 2 (2024). https://doi.org/10.1007/s11222-023-10322-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10322-3