A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

Zhang, Rui; Qin, Guoyou; Tu, Dongsheng

doi:10.1007/s00180-022-01229-0

A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

Original paper
Published: 23 May 2022

Volume 38, pages 299–326, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Rui Zhang¹,
Guoyou Qin^1,2 &
Dongsheng Tu³

431 Accesses
Explore all metrics

Abstract

Subgroup identification has emerged as a popular statistical tool to access the heterogeneity in treatment effects based on specific characteristics of patients. Recently, a threshold linear mixed-effects model was proposed to identify a subgroup of patients who may benefit from treatment concerning longitudinal outcomes based on whether a continuous biomarker exceeds an unknown cut-point. This model assumes, however, normal distributions to both random effects and error terms and, therefore, may be sensitive to outliers in the longitudinal outcomes. In this paper, we propose a robust subgroup identification method for longitudinal data by developing a robust threshold t linear mixed-effects model, where random effects and within-subject errors follow a multivariate t distribution, with unknown degree-of-freedoms. The likelihood function is, however, difficult to directly maximize because the density function of a non-central t distribution is too complicated to compute and an indicator function is included in the definition of the mode. We, therefore, propose a smoothed expectation conditional maximization algorithm based on a gamma-normal hierarchical structure and the smooth approximation of an indicator function to make inferences on the parameters in the model. Simulation studies are conducted to investigate the performance and robustness of the proposed method. As an application, the proposed method is used to identify a subgroup of patients with advanced colorectal cancer who may have a better quality of life when treated by cetuximab.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Subgroup Analysis with Partial Linear Regression Model

Subgroup Analysis Using Doubly Robust Semiparametric Procedures

A Selective Overview of Fusion Penalized Learning in Latent Subgroup Analysis for Precision Medicine

Availability of data and material

Not applicable

Code availability

Not applicable

References

Bai X, Chen K, Yao W (2016) Mixture of linear mixed models using multivariate t distribution. J Stat Comput Simul 86(4):771–787
Article MathSciNet MATH Google Scholar
Berger J, Wang X, Shen L (2014) A bayesian approach to subgroup identification. J Biopharmaceutical Stat 24:110–29
Article MathSciNet Google Scholar
Chen B, Jiang W, Tu D (2014) A hierarchical bayes model for biomarker subset effects in clinical trials. Comput Stat Data Anal 71:324–334
Article MathSciNet MATH Google Scholar
de Alencar FH, Galarza CE, Matos LA, Lachos VH (2020) Censmfm: finite mixture of multivariate censored/missing data. R Package Version 211 http://CRANR-projectorg/package=CensMFM
Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information. Biometrika 65(3):457–483
Article MathSciNet MATH Google Scholar
Efron B, Tibshirani RJ (1993) An Introduction to the Bootstrap. No. 57 in Monographs on Statistics and Applied Probability, Chapman & Hall/CRC
Foster JC, Taylor JM, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30(24):2867–2880
Article MathSciNet Google Scholar
Gavanji P, Chen B, Jiang W (2018) Residual bootstrap test for interactions in biomarker threshold models with survival data. Stat Biosci 10:202–216
Article Google Scholar
Ge X, Peng Y, Tu D (2020) A threshold linear mixed model for identification of treatment-sensitive subsets in a clinical trial based on longitudinal outcomes and a continuous covariate. Stat Methods Med Res 29(10):2919–2931
Article MathSciNet Google Scholar
Hartley HO, Rao JNK (1967) Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika 54(1/2):93–108
Article MathSciNet MATH Google Scholar
He Y, Lin H, Tu D (2018) A single-index threshold cox proportional hazard model for identifying a treatment-sensitive subset based on multiple biomarkers. Stat Med 37(23):3267–3279
Article MathSciNet Google Scholar
Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat. https://doi.org/10.1214/12-AOAS593
Article MathSciNet MATH Google Scholar
Jiang W, Freidlin B, Simon R (2007) Biomarker-adaptive threshold design: a procedure for evaluating treatment with possible biomarker-defined subset effect. J Natl Cancer Inst 99:1036–43. https://doi.org/10.1093/jnci/djm022
Article Google Scholar
Jonker DJ, O’Callaghan CJ, Karapetis CS, Zalcberg JR, Tu D, Au HJ, Berry SR, Krahn M, Price T, Simes RJ, Tebbutt NC, van Hazel G, Wierzbicki R, Langer C, Moore MJ (2007) Cetuximab for the treatment of colorectal cancer. New Engl J Med 357(20):2040–2048
Article Google Scholar
Karapetis C, Khambata-Ford S, Jonker D, O’Callaghan C, Tu D, Tebbutt N, Simes J, Chalchal H, Shapiro J, Robitaille S, Price T, Shepherd L, Au HJ, Langer C, Moore M, Zalcberg J (2008) K-ras mutations and benefit from cetuximab in advanced colorectal cancer. New Engl J Med 359:1757–65
Article Google Scholar
Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84(408):881–896
MathSciNet Google Scholar
Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search (sides)—a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med 30:2601–21. https://doi.org/10.1002/sim.4289
Article MathSciNet Google Scholar
Loh WY, Fu H, Man M, Champion V, Yu M (2016) Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables. Stat Med 35(26):4837–4855
Article MathSciNet Google Scholar
Loh WY, Cao L, Zhou P (2019) Subgroup identification for precision medicine: a comparative review of 13 methods. WIREs Data Min Knowld Discov 9(5):e1326
Google Scholar
Lu W, Qin G, Zhu Z, Tu D (2021) Multiply robust subgroup identification for longitudinal data with dropouts via median regression. J Multivar Anal. https://doi.org/10.1016/j.jmva.2020.104691
Article MathSciNet MATH Google Scholar
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
Article Google Scholar
Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J Comput Graph Stat 10(2):249–276
Article MathSciNet Google Scholar
Su X, Tsai CL, Wang H, Nickerson D, Li B (2009) Subgroup analysis via recursive partitioning. J Mach Learn Res 10:141–158
Google Scholar
Su X, Zhou T, Yan X, Fan J, Yang S (2008) Interaction trees with censored survival data. Int J Biostat 4(1)
Wells C, Tu D, Siu L, Shapiro J, Jonker D, Karapetis C, Simes J, Liu G, Price T, Tebbutt N, O’Callaghan C (2018) Outcomes of older patients ($>=$70 years) treated with targeted therapy in metastatic chemorefractory colorectal cancer: A retrospective analysis of ncic ctg co.17 and co.20. Clinical Colorectal Cancer 18
Xing Y, Yu T, Wu Y, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–60
Article Google Scholar
Xu Y, Yu M, Zhao YQ, Li Q, Wang S, Shao J (2015) Regularized outcome weighted subgroup identification for differential treatment effects: outcome weighted subgroup identification. Biometrics. https://doi.org/10.1111/biom.12322
Article MATH Google Scholar

Download references

Funding

This work was partially supported by the National Natural Science Foundation of China (11871164), Shanghai Special Program : Clinical Multidisciplinary Treatment System and Systems Epidemiology Research, Three-year Action Program of Shanghai Municipality for Strengthening the Construction of Public Health System (GWV-10.1-XK05) Big Data and Artificial Intelligence Application, Shanghai Municipal Science and Technology Major Project (ZD2021CY001 ), and Natural Science and Engineering Council of Canada.

Author information

Authors and Affiliations

Department of Biostatistics, School of Public Health and Key Laboratory of Public Health Safety, Fudan University, Shanghai, China
Rui Zhang & Guoyou Qin
Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
Guoyou Qin
Canadian Cancer Trials Group, Queen’s University, Kingston, Canada
Dongsheng Tu

Authors

Rui Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Guoyou Qin
View author publications
You can also search for this author inPubMed Google Scholar
Dongsheng Tu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Not applicable

Corresponding author

Correspondence to Guoyou Qin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

(1) Proofs of Eqs. (15) and (17).

Since the conjugate prior follows a gamma distribution and likelihood a normal distribution, it can be shown that the posterior distribution($\tau _{i}|Y_{i}$) follows a gamma distribution. Similarly, the posterior distribution($\alpha _{i}|Y_{i}, \tau _{i}$) can be shown following a normal distribution. Denote f() as a density function. The following are detailed derivations for Eqs. (15) and (17).

Derivation of Eq. (15): Since

$$\begin{aligned} f(\tau _{i}|Y_{i})\propto & {} f(Y_{i}|\tau _{i})\times f(\tau _{i})\\\propto & {} \tau _{i}^{\frac{n_{i}}{2}}exp\left( -\frac{1}{2}(Y_{i}-X_{i}\beta -W_{i}\eta )^{\prime }\right. \\&\quad \left. \left( \frac{1}{\tau _{i}}(Z_{i}\varPhi Z_{i}^{\prime }+\sigma ^{2} I)\right) ^{-1}(Y_{i}-X_{i}\beta -W_{i}\eta )\right) \\&\qquad \times \tau _{i}^{\frac{\nu _{i}}{2}-1}exp\left( -\frac{\nu _{i}}{2}\tau _{i}\right) \\= & {} \tau _{i}^{\frac{n_{i}}{2}}exp\left( -\frac{ \delta _{i}^{2} (\beta ,\eta ,\varPhi ,\sigma ^{2},c)}{2}\tau _{i}\right) \times \tau _{i}^{\frac{\nu _{i}}{2}-1}exp\left( -\frac{\nu _{i}}{2}\tau _{i}\right) \\= & {} \tau _{i}^{\frac{\nu _{i}+n_{i}}{2}-1}\times exp\left( -\frac{ \delta _{i}^{2}(\beta ,\eta ,\varPhi ,\sigma ^{2},c)+\nu _{i}}{2}\tau _{i}\right) \end{aligned}$$

where

$$\begin{aligned} \delta _{i}^{2}(\beta ,\eta ,\varPhi ,\sigma ^{2},c)=(Y_{i}-X_{i}\beta -W_{i}\eta )^{\prime }(Z_{i}\varPhi Z_{i}^{\prime }+\sigma ^{2} I)^{-1}(Y_{i}-X_{i}\beta -W_{i}\eta ). \end{aligned}$$

Therefore,

$$\begin{aligned} \tau _{i}|Y_{i}\sim \Gamma \left( \frac{\nu _{i}+n_{i}}{2},\frac{\nu _{i}+\delta _{i}^{2}(\beta ,\eta ,\varPhi ,\varLambda _{i},c)}{2}\right) \end{aligned}$$

Derivation of Eq. (17): Since

$$\begin{aligned}&f(\alpha _{i}|y_{i},\tau _{i})\propto f(y_{i}|\alpha _{i},\tau _{i})\times f(\alpha _{i}|\tau _{i})\\&\qquad \propto exp\left( -\frac{1}{2}\left( Y_{i}-X_{i}\beta -Z_{i}\alpha _{i} -W_{i}\eta \right) ^{\prime }\left( \frac{1}{\tau _{i}}\sigma ^{2}I\right) ^{-1} \left( Y_{i}-X_{i}\beta -Z_{i}\alpha _{i}-W_{i}\eta \right) \right) \\&\qquad \times exp\left( -\frac{1}{2}\alpha _{i}^{\prime } \left( \frac{1}{\tau _{i}}\varPhi \right) ^{-1}\alpha _{i}\right) \\&\qquad \propto exp\left( \alpha _{i}^{\prime }Z_{i}^{\prime } \left( \frac{1}{\tau _{i}}\sigma ^{2}I\right) ^{-1} \left( Y_{i}-X_{i}\beta -Z_{i}\alpha _{i}-W_{i}\eta \right) \right. \\&\qquad \left. -\frac{1}{2}\alpha _{i}^{\prime }Z_{i}^{\prime } \left( \frac{1}{\tau _{i}}\sigma ^{2}I\right) ^{-1}Z_{i}\alpha _{i} -\frac{1}{2}\alpha _{i}^{\prime }\left( \frac{1}{\tau _{i}}\varPhi \right) ^{-1}\alpha _{i}\right) \\&\quad =exp\left( \alpha _{i}^{\prime }Z_{i}^{\prime } \left( \frac{1}{\tau _{i}}\sigma ^{2}I\right) ^{-1} \left( Y_{i}-X_{i}\beta -Z_{i}\alpha _{i}-W_{i}\eta \right) \right) \\&\qquad \times exp\left( -\frac{1}{2}\alpha _{i}^{\prime }\left( \tau _{i} \left( \frac{1}{\sigma ^{2}}Z_{i}I^{-1}Z_{i}+\varPhi ^{-1}\right) \right) \alpha _{i}\right) \\&\qquad \propto exp\left( -\frac{1}{2}\left( \alpha _{i}-\varOmega _{i}Z_{i}^{\prime } \left( \sigma ^{2}I\right) ^{-1}\left( Y_{i}-X_{i}\beta -W_{i} \eta \right) \right) ^{\prime } \left( \frac{1}{\tau _{i}}\varOmega _{i}\right) ^{-1}\right. \\&\qquad \left. \left( \alpha _{i}-\varOmega _{i}Z_{i}^{\prime } \left( \sigma ^{2}I\right) ^{-1}\left( Y_{i}-X_{i}\beta -W_{i}\eta \right) \right) \right) \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \varOmega _{i}&= \left( \frac{1}{\sigma ^{2}}Z_{i}^{\prime }I^{-1}Z_{i}+\varPhi ^{-1}\right) ^{-1} \\ {}&=\varPhi -\varPhi Z_{i}^{\prime }(Z_{i}\varPhi Z_{i}^{\prime }+\sigma ^{2}I)^{-1}Z_{i}\varPhi \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned}&\alpha _{i}|Y_{i}, \tau _{i} \sim N\left( \varPhi Z_{i}^{\prime }(Z_{i}\varPhi Z_{i}^{\prime }+\sigma ^{2}I)^{-1}\left( Y_{i}-X_{i}\beta -W_{i}\eta \right) \right. , \\&\quad \left. \frac{1}{\tau _{i}}\left( \varPhi -\varPhi Z_{i}^{\prime }(Z_{i}\varPhi Z_{i}^{\prime }+\sigma ^{2}I)^{-1}Z_{i}\varPhi \right) \right) , \end{aligned} \end{aligned}$$

(2) See Table 8

Table 8 Results of simulation studies over 500 replications with different degree-of-freedom assumption for sample size m=100 and 400

Full size table

(3) See Table 9

Table 9 Results of simulation studies over 500 replications for sample size m=100,400 with one assumed as normal distribution and the other assumed as t distribution

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Qin, G. & Tu, D. A robust threshold t linear mixed model for subgroup identification using multivariate T distributions. Comput Stat 38, 299–326 (2023). https://doi.org/10.1007/s00180-022-01229-0

Download citation

Received: 08 September 2021
Accepted: 18 April 2022
Published: 23 May 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00180-022-01229-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Subgroup Analysis with Partial Linear Regression Model

Subgroup Analysis Using Doubly Robust Semiparametric Procedures

A Selective Overview of Fusion Penalized Learning in Latent Subgroup Analysis for Precision Medicine

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now