Abstract
Although the concept of sufficient dimension reduction that was originally proposed has been there for a long time, studies in the literature have largely focused on properties of estimators of dimension-reduction subspaces in the classical “small p, and large n” setting. Rather than the subspace, this paper considers directly the set of reduced predictors, which we believe are more relevant for subsequent analyses. A principled method is proposed for estimating a sparse reduction, which is based on a new, revised representation of an existing well-known method called the sliced inverse regression. A fast and efficient algorithm is developed for computing the estimator. The asymptotic behavior of the new method is studied when the number of predictors, p, exceeds the sample size, n, providing a guide for choosing the number of sufficient dimension-reduction predictors. Numerical results, including a simulation study and a cancer-drug-sensitivity data analysis, are presented to examine the performance.
Similar content being viewed by others
References
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)
Bondell, H.D., Li, L.: Shrinkage inverse regression estimation for model-free variable selection. J. R. Stat. Soc. Ser. B 71(1), 287–299 (2009)
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25(2), 173–187 (2015)
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)
Buldygin, V.V., Kozachenko, Y.V.: Metric Characterization of Random Variables and Random Processes. American Mathematical Society, Providence, RI (2000)
Bunea, F., She, Y., Wegkamp, M.H.: Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Stat. 40(5), 2359–2388 (2012)
Chen, L., Huang, J.Z.: Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Am. Stat. Assoc. 107(500), 1533–1545 (2012)
Chen, X., Zou, C., Cook, R.D.: Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38(6), 3696–3723 (2010)
Cook, R.D.: Using dimension-reduction subspaces to identify important inputs in models of physical systems. In: Proceedings of the section on Physical and Engineering Sciences, pp. 18–25. American Statistical Association, Alexandria, VA (1994)
Cook, R.D.: Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York (1998)
Cook, R.D.: Testing predictor contributions in sufficient dimension reduction. Ann. Stat. 32(3), 1062–1092 (2004)
Cook, R.D., Li, B., Chiaromonte, F.: Dimension reduction in regression without matrix inversion. Biometrika 94(3), 569–584 (2007)
Cook, R.D., Weisberg, S.: Comment. J. Am. Stat. Assoc. 86(414), 328–332 (1991)
Eaton, M.L.: Multivariate Statistics: A Vector Space Approach. Wiley, New York (1983)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Garnett, M.J., Edelman, E.J., Heidorn, S.J., Greenman, C.D., Dastur, A., Lau, K.W., Greninger, P., Thompson, I.R., Luo, X., Soares, J., et al.: Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483(7391), 570–575 (2012)
Gregg, J., Fraizer, G.: Transcriptional regulation of EGR1 by EGF and the ERK signaling pathway in prostate cancer cells. Genes Cancer 2(9), 900–909 (2011)
Harada, T., Morooka, T., Ogawa, S., Nishida, E.: Erk induces p35, a neuron-specific activator of Cdk5, through induction of Egr1. Nat. Cell Biol. 3(5), 453–459 (2001)
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)
Jiang, B., Liu, J.S.: Variable selection for general index models via sliced inverse regression. Ann. Stat. 42(5), 1751–1786 (2014)
Li, K.-C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86(414), 316–327 (1991)
Li, K.-C.: High dimensional data analysis via the SIR/PHD approach (2000)
Li, L., Li, H.: Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20(18), 3406–3412 (2004)
Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 102(479), 997–1008 (2007)
Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64(1), 124–131 (2008)
Liu, H., Zhang, J.: Estimation consistency of the group lasso and its applications. In: International Conference on Artificial Intelligence and Statistics pp. 376–383 (2009)
Long, Y.C., Cheng, Z., Copps, K.D., White, M.F.: Insulin receptor substrates Irs1 and Irs2 coordinate skeletal muscle growth and metabolism via the Akt and AMPK pathways. Mol. Cell. Biol. 31(3), 430–441 (2011)
Luo, H., Yanagawa, B., Zhang, J., Luo, Z., Zhang, M., Esfandiarei, M., Carthy, C., Wilson, J.E., Yang, D., McManus, B.M.: Coxsackievirus B3 replication is reduced by inhibition of the extracellular signal-regulated kinase (ERK) signaling pathway. J. Virol. 76(7), 3365–3373 (2002)
Ma, Y., Zhu, L.: A review on dimension reduction. Int. Stat. Rev. 81(1), 134–150 (2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Wang, X., Li, G., Hibshoosh, H., Halmos, B.: Phlda1/2 contribute to tumor suppression in breast and lung cancer as downstream targets of oncogenic HER2 signaling. Cancer Res. 72(8 Supplement), 20–20 (2012)
Wang, T., Zhao, H., Chen, M., Zhu, L.: Supplement to “Model-free dimension reduction and variable selection in high-dimensional regression” (2015)
Wu, Y., Li, L.: Asymptotic properties of sufficient dimension reduction with a diverging number of predictors. Statistica Sinica 2011(21), 707–730 (2011)
Yin, X.: Sufficient dimension reduction in regression. In: Shen, X., Cai, T. (eds.) The Analysis of High-Dimensional Data. World Scientific, New Jersey (2010)
Yin, X., Hilafu, H.: Sequential sufficient dimension reduction for large \(p\), small \(n\) problems. J. R. Stat. Soc. Ser. B 77(4), 879–892 (2015)
Yin, X., Li, B., Cook, R.D.: Successive direction extraction for estimating the central subspace in a multiple-index regression. J. Multivar. Anal. 99(8), 1733–1757 (2008)
Yu, Z., Zhu, L., Peng, H., Zhu, L.: Dimension reduction and predictor selection in semiparametric models. Biometrika 100(3), 641–654 (2013)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)
Zheng, Y., Zhang, C., Croucher, D.R., Soliman, M.A., St-Denis, N., Pasculescu, A., Taylor, L., Tate, S.A., Hardy, W.R., Colwill, K., et al.: Temporal regulation of EGF signalling networks by the scaffold protein Shc1. Nature 499(7457), 166–171 (2013)
Zhong, W., Zeng, P., Ma, P., Liu, J.S., Zhu, Y.: Rsir: regularized sliced inverse regression for motif discovery. Bioinformatics 21(22), 4169–4175 (2005)
Zhu, L., Wang, T., Zhu, L., Ferré, L.: Sufficient dimension reduction through discretization-expectation estimation. Biometrika 97(2), 295–304 (2010)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)
Acknowledgments
The research of Tao Wang is supported by Natural Science Foundation of China (Grant No. 11601326). Mengjie Chen’s research is supported by NIH R01 CA082659. Hongyu Zhao’s research is supported by NIH R01 GM59507. Lixing Zhu’s research is supported by Natural Science Foundation of China (Grant No. 11671042). The authors thank the Editor, the Associate Editor, and the anonymous reviewers for their helpful comments that have resulted in significant improvements in the article.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, T., Chen, M., Zhao, H. et al. Estimating a sparse reduction for general regression in high dimensions. Stat Comput 28, 33–46 (2018). https://doi.org/10.1007/s11222-016-9714-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9714-6