Estimation of linear mixed models with a mixture of distribution for the random effects.

Proust C; Jacqmin-Gadda H

doi:10.1016/j.cmpb.2004.12.004

Estimation of linear mixed models with a mixture of distribution for the random effects.

Proust C ¹,

Jacqmin-Gadda H

Affiliations

1. Institut National de la Santé et de la Recherche Médicale, Equipe de biostatistique E0338, Université de Bordeaux 2, 146 rue Léo Saignat, 33076 Bordeaux Cedex, France.
Authors
Proust C¹
(1 author)

ORCIDs linked to this article

Proust-Lima C | 0000-0002-9884-955X

Computer Methods and Programs in Biomedicine, 01 May 2005, 78(2):165-173
https://doi.org/10.1016/j.cmpb.2004.12.004 PMID: 15848271 PMCID: PMC1913221

Free full text in Europe PMC

Abstract

The aim of this paper is to propose an algorithm to estimate linear mixed model when random effect distribution is a mixture of Gaussians. This heterogeneous linear mixed model relaxes the classical Gaussian assumption for the random effects and, when used for longitudinal data, can highlight distinct patterns of evolution. The observed likelihood is maximized using a Marquardt algorithm instead of the EM algorithm which is frequently used for mixture models. Indeed, the EM algorithm is computationally expensive and does not provide good convergence criteria nor direct estimates of the variance of the parameters. The proposed method also allows to classify subjects according to the estimated profiles by computing posterior probabilities of belonging to each component. The use of heterogeneous linear mixed model is illustrated through a study of the different patterns of cognitive evolution in the elderly. HETMIXLIN is a free Fortran90 program available on the web site: http://www.isped.u-bordeaux2.fr.

Free full text

Comput Methods Programs Biomed. Author manuscript; available in PMC 2007 Jul 10.

Published in final edited form as:

Comput Methods Programs Biomed. 2005 May; 78(2): 165–173.

https://doi.org/10.1016/j.cmpb.2004.12.004

PMCID: PMC1913221

HALMS: HALMS130037

PMID: 15848271

Estimation of linear mixed models with a mixture of distribution for the random effects

Cécile Proust-Lima^* and Hélène Jacqmin-Gadda

Author information Copyright and License information Disclaimer

The publisher's final edited version of this article is available at Comput Methods Programs Biomed

See other articles in PMC that cite the published article.

Abstract

Keywords: Aged, Aging, Algorithms, Child, Cognition, Computers, Female, Humans, Linear Models, Longitudinal Studies, Normal Distribution, Software

Keywords: Heterogeneous mixed model, Mixture model, Longitudinal data, Newton-Raphson-like algorithm, Cognitive ageing

1 Introduction

Many longitudinal studies consist in assessing the changes over the time of a marker measured repeatedly on each participant. These analyses are generally performed with mixed models [1] which allow to take into account the within-subject correlation and the variability of the marker course between the subjects. However such a model is based on the strong assumption that the random effects are sampled from a single multivariate Gaussian distribution which means that the marker course is homogeneous among all the subjects.

To assess this assumption, Verbeke and Lesaffre [2] have proposed a mixed model with a mixture of multivariate Gaussians on the random effects. This heterogeneous linear mixed model allows to relax the normality assumption for the random effects and also allows to highlight distinct evolutions of the marker and classify the subjects according to these different patterns of evolution.

In the Verbeke and Lesaffre’s work as in more recent papers [3,4], the mixed models with a mixture on the random effects distribution were estimated using an EM algorithm [5]. For instance, Spiessens and Verbeke [3] recently proposed a free SAS-macro (HETNLMIXED) using the EM algorithm and the NLMIXED procedure for the optimization in the M step. This SAS-macro is an extension of the SAS-macro HETMIXED which was developed earlier for estimating heterogeneous linear mixed models using the MIXED procedure [6]. To our knowledge, HETNLMIXED and its first version HETMIXED are the only free available programs developed for estimating heterogeneous mixed models. The first version HETMIXED was proved to be very slow and limited to small samples due to very large matrices handling and prohibitive computation; it will not be expanded in this work. HETNLMIXED was developed to reduce these computational problems and to allow estimation of both linear and generalized linear models. However, in the linear case, this SAS-macro has the drawback of computing numerically an integral across the random effects while it has a closed form, and thus the macro is limited to a small number of random-effects. We have also observed convergence problems when using the macro with large samples except for very simple models.

Moreover, the EM algorithm, which is used in these macros, has some general drawbacks. In particular it does not have any good convergence criteria; the convergence is only built on a lack of progression of the likelihood or the parameter estimates [7]. Furthermore, the convergence is slow [8] and the EM algorithm does not provide direct estimates of the variance of the parameters. In the particular case of an heterogeneous mixed model, the M step also requires the estimation of an homogeneous mixed model which is computationally expensive.

Therefore the first aim of this paper is to propose a program for estimating more general heterogeneous linear mixed models suitable for large samples. The proposed program HETMIXLIN is written in Fortran90 and uses a direct maximization of the likelihood via a Marquardt optimization algorithm. The second objective of this paper is to illustrate the use of heterogeneous linear mixed model through a study of the different patterns of evolution in cognitive ageing.

2 Computational methods and theory

2.1 The heterogeneous linear mixed model

Let Y_i = (Y_i₁,…, Y_ini) be the response vector for the n_i measurements of the subject i with i = 1,…, N. The linear mixed model [1] for the response vector Y_i is defined as:

Y_{i} = X_{i} β + Z_{i} u_{i} + ε_{i}

(1)

X_i is a n_i × p design matrix for the p-vector of fixed effects β, and Z_i is a n_i × q design matrix associated to the q-vector of random effects u_i which represents the subject specific regression coefficients. The errors ε_i are assumed to be normally distributed with mean zero and covariance matrix σ²I_ni, and are assumed to be independent from the vector of random effects u_i.

In an homogeneous mixed model [1], u_i is normally distributed with mean μ and covariance matrix D i.e.

u_{i} \sim N (μ, D)

(2)

In the heterogeneous mixed model [2–4], u_i is assumed to follow a mixture of G multivariate Gaussians with different means (μ_g)_g=_1,_G and a common covariance matrix D i.e.

u_{i} \sim \sum_{g = 1}^{G} π_{g} N (μ_{g}, D)

(3)

Each component g of the mixture has a probability π_g and the (π_g)_g=_1,_G verify the following conditions:

0 \leq π_{g} \leq 1 \forall g = 1, G and \sum_{g = 1}^{G} π_{g} = 1

(4)

In this work, we propose a slightly more general formulation of the model described in (1) in which the effect of some covariates may depend on the components of mixture and some of the random effects may have a common mean whatever the component of mixture. Thus, the X_i design matrix is split in X₁_i associated with the vector β of fixed effects which are common to all the components and X₂_i associated with the vectors γ_g of fixed effects which are specific to the components. The Z_i design matrix is also splitted in Z₁_i associated with the vector v_i of random effects following a single Gaussian distribution and Z₂_i associated with the vector u_i of random effects following a mixture of Gaussian distributions. The model is then written as:

Y_{i} = X_{1 i} β + \sum_{g = 1}^{G} π_{g} X_{2 i} δ_{g} + Z_{1 i} v_{i} + Z_{2 i} u_{i} + ε_{i}

(5)

where v_i ~ N(0, D_v) and $u_{i} \sim \sum_{g = 1}^{G} π_{g} N (μ_{g}, D_{u})$ ; given the component g, the conditional distribution of the vector $(\begin{matrix} v_{i} \\ u_{i} \end{matrix})$ is $N ((\begin{matrix} 0 \\ μ_{g} \end{matrix}), D)$ with $D = (\begin{matrix} D_{v} & D_{v u} \\ Duv & D_{u} \end{matrix})$ .

2.2 Likelihood

Following the previous works [3,4], we define w_ig the unobserved variable indicating if the subject i belongs to the component g. We have P(w_ig = 1) = π_g. The density for the vector y_i can then be written as:

f_{i} (y_{i}) = \sum_{g = 1}^{G} π_{g} f (y_{i} ∣ w_{i g} = 1)

(6)

Given w_ig, y_i follows a linear mixed model, and the density f(y_i|w_ig = 1) denoted by (_ig is the multivariate Gaussian density with mean E_ig and covariance matrix V_i given by:

\begin{matrix} E_{i g} = E (Y_{i} ∣ w_{i g} = 1) = X_{1 i} β + X_{2 i} δ_{g} + Z_{2 i} μ_{g} \\ and \\ V_{i} = Var (Y_{i} ∣ w_{i g} = 1) = Z_{i} D Z_{i}^{'} + σ^{2} I_{n_{i}} \end{matrix}

(7)

Let now θ be the vector of the m parameters of the model. θ contains ψ with $ψ^{'} = (β^{'}, {(δ_{g})}_{g = 1, G}^{'}, {(μ_{g})}_{g = 1, G}^{'}, Vec (D)^{'}, σ^{2})$ and π the vector of the G − 1 first component probabilities (π_g)_g_=1,_G₋₁. Note that π_g is entirely determined by π as $1 - \sum_{g = 1}^{G - 1} π_{g}$ . Vec(D) represents the vector of the upper triangular elements of D. The estimates of θ are obtained as the vector that maximizes the observed log-likelihood:

\begin{array}{l} L (Y; θ) & = \sum_{i = 1}^{N} l n (f_{i} (y_{i})) \\ = \sum_{i = 1}^{N} l n (\sum_{g = 1}^{G} π_{g} φ_{i g} (y_{i})) \\ = \sum_{i = 1}^{N} - \frac{n_{i}}{2} l n (2 π) - \frac{1}{2} l n (∣ V_{i} ∣) + l n (\sum_{g = 1}^{G} π_{g} e^{- \frac{1}{2} (Y_{i} - E_{i g})^{'} V_{i}^{- 1} (Y_{i} - E_{i g})}) \end{array}

(8)

2.3 Estimation procedure

We propose to maximize directly the observed log-likelihood (8) using a modified Marquardt optimization algorithm [9], a Newton-Raphson like algorithm [10]. The diagonal of the Hessian at iteration k, H⁽^k⁾, is inflated to obtain a positive definite matrix as: $H^{* (k)} = (H_{i j}^{* (k)})$ with $H_{i i}^{* (k)} = H_{i i}^{(k)} + λ [(1 - η) ∣ H_{i i}^{(k)} ∣ + η t r (H)]$ and $H_{i j}^{* (k)} = H_{i j}^{(k)}$ if i ≠ j. Initial values for λ and η are λ = 0.01 and η = 0.01. They are reduced when H* is positive definite and increased if not. The estimates θ⁽^k⁾ are then updated to θ(^k⁺¹) using the current modified Hessian H*⁽^k⁾ and the current gradient of the parameters g(θ⁽^k⁾) according to the formula:

θ^{(k + 1)} = θ^{(k)} - α H^{* (k) - 1} g (θ^{(k)})

(9)

where, if necessary, α is modified to ensure that the log-likelihood is improved at each iteration.

To ensure that the covariance matrix D is positive, we maximize the log-likelihood on the non zero elements of U, the Cholesky factor of D (i.e. U′U = D) [7]. Furthermore, to deal with the constraints on π (4) we use the transformed parameters (γ_g)_g_=1,_G₋₁ with:

γ_{g} = l n (\frac{π_{g}}{π_{G}})

(10)

Standard errors of the elements of D and (π_g)_g_=1,_G₋₁ are computed by the Δ-method [11] while standard errors of the other parameters are directly computed using the inverse of the observed Hessian matrix.

The convergence is reached when the three following convergence criteria are satisfied: $\sum_{j = 1}^{m} {(θ_{j}^{(k)} - θ_{j}^{(k - 1)})}^{2} \leq ε_{a}$ , |L⁽^k⁾−L^(k⁻¹⁾| ≤ ε_b and g(θ⁽^k⁾)′H⁽^k⁾⁻¹g(θ⁽^k⁾) ≤ ε_d. The default values are ε_a = 10⁻⁵, ε_b = 10⁻⁵ and ε_d = 10⁻⁸.

As the log-likelihood of a mixture model may have several maxima [8], we use a grid of initial values to find the global maximum. The multimodality of the log-likelihood in mixture models has been often discussed and some authors proposed different strategies to choose the set of initial values [12]. However, none of them seems to be optimal in a general way. We have observed, in our experience, that the results were mainly sensitive to initial values of (π_g)_g_=1,_G₋₁ and (μ_g)_g_=1,_G and less sensitive to the other parameters (Vec(U), β and σ) for which estimates of the homogeneous mixed models were good initial values.

A mixture model is estimated with a fixed number of components G, otherwise the number of parameters in the model is unknown. To choose the right number of components, one has to estimate models with different values for G and select the best model according to a test or a criterion. Some works favor a bootstrap approach to approximate the asymptotic distribution of the likelihood ratio test between models with different number of components [13] but this approach is very heavy in particular for mixture models with random effects. Criteria such as Akaike’s Information Criterion (AIC) [14] or Bayesian Information Criterion (BIC) [15] are often preferred. We use these selection criteria to select the optimal number of components.

2.4 A posteriori classification

After parameter estimation, mixture models allow to classify subjects according to the G components. The classification is based on the posterior probabilities (π_ig)_g_=1,_G that the subject i follows each of the G components. Using = (′, ′)′, these probabilities are obtained by the Bayes theorem [2–4] as:

{\hat{π}}_{i g} = P (w_{i g} = 1 ∣ Y_{i}, \hat{θ}) = \frac{{\hat{π}}_{g} φ_{i g} (\hat{ψ}, Y_{i})}{\sum_{g = 1}^{G} {\hat{π}}_{g} φ_{i g} (\hat{ψ}, Y_{i})}

(11)

We then assign to each subject i the component to which he has the highest probability (π_ig)_g_=1,_G to belong.

3 Program description

The program requires two distinct input files: the data file described in appendix 1 and the parameter file named HETMIXLIN.inf which contains the information needed for the estimation of the model: the names of the data file and output files, the number of subjects, the description of the model (number of components G, dimension of the random effects, covariates X₁, X₂, Z₁ and Z₂ and covariance structure of D) and the initial values of the parameters. An example of the parameter file is given in appendix 2.

The main output file gives the final log-likelihood, the AIC, the BIC, the convergence criteria, the number of iterations and the parameter estimates with the standard errors, the Wald statistics and the 95% confidence interval. The number of subjects classified in each component is also given.

Finally, another output file contains the posterior probabilities for each subject to belong to each class and the final class membership.

4 Applications

4.1 The height of schoolgirls

We consider the sample of 20 preadolescent schoolgirls introduced by Goldstein [16]. Verbeke et al [2] and Komárek et al [6] modelled the growth curves of their height according to age from 6 to 10. They showed in the homogeneous mixed model that the height course of girls differed significantly according to the category of height of their mother (small, medium and tall). Thus they used the heterogeneous linear mixed model without introducing the height of the mother in the model to try to highlight clusters with distinct growth curves among the girls. In this work, we compare the results obtained using our program to those obtained with the HETNLMIXED SAS-macro which uses the EM algorithm. The model is written as:

{Height}_{i j} = u_{0 i} + u_{1 i} \times {age}_{i j} + ε_{i j}

(12)

where $u_{i} = (u_{0 i}, u_{1 i})^{'} \sim \sum_{g = 1}^{G} π_{g} N (μ_{g}, D)$ with μ_g = (μ₀_g, μ₁_g)′ and ε_ij ~_iid N (0, σ²)

We fitted the heterogeneous linear mixed model for two and three components. An extract of the data file and the parameter file for the model with two components are presented in appendices 1 and 2. The results for the model with two components of mixture obtained with our program HETMIXLIN and the SAS-macro HETNLMIXED are shown in Table 1. The estimates obtained using the two methods are the same but a difference is observed in the standard error estimates; the standard errors estimates from HETNLMIXED seem bigger than those from HETMIXLIN. This difference in the standard error estimates from the two algorithms was also observed in the homogeneous case comparing HETNLMIXED, the MIXED procedure, the NLMIXED procedure and HETMIXLIN program. In the three latter programs, standard errors are estimated by the inverse of the Hessian matrix which estimates the Fisher Information matrix [11] and led to the same standard error estimates. By contrast, HETNLMIXED uses an approximation of the Louis’s method based on the product of the expectations of the gradient of the complete likelihood [17], the Louis’s method [18] being itself an approximation of the observed Hessian matrix. This method appeared to overestimate standard errors in this small sample. However, in our experience, this approximation of the observed Hessian matrix seemed to be improved when the sample size increased. For instance, using a linear mixed model estimated on the 1,392 subjects of the PAQUID sample from section 4.2, the discrepancy was lower.

Table 1

Estimates and standard-errors of the heterogeneous linear mixed model with two components of mixture for the height of schoolgirls using HETMIXLIN (the proposed direct maximisation using a Marquardt algorithm) and HETNLMIXED (Spiessens et al SAS-macro using an EM algorithm) and estimates of the heterogeneous linear mixed model with three components of mixture using HETMIXLIN

	G = 2 HETNLMIXED		G =2 HETMIXLIN		G =3 HETMIXLIN
Parameter	Estimate	SE^*	Estimate	SE^**	Estimate	SE^**
π₁	0.68	0.14	0.68	0.12	0.50	0.18
π₂	0.32		0.32		0.30	0.11
π₃					0.20
μ₀₁	82.8	1.12	82.8	0.91	84.2	1.18
μ₁₁	5.38	0.091	5.38	0.086	5.32	0.10
μ₂	81.9	2.01	81.9	1.52	81.7	1.12
μ₁₂	6.44	0.18	6.44	0.15	6.47	0.12
μ₀₃					79.4	2.19
μ₁₃					5.60	0.20
Var(u₀_i)	6.47	4.94	6.47	3.13	3.50	2.39
cov(u₀_i, u₁_i)	0.13	0.40	0.13	0.35	0.32	0.13
Var(u₁_i)	0.034	0.056	0.034	0.030	0.030	0.024
σ	0.69	0.10	0.69	0.063	0.68	0.06

-2L				166.67		165.94
AIC				351.35		355.87
BIG				360.32		367.82

^*Standard Errors obtained using Louis’s method

^**Standard Errors obtained using the inverse of the Hessian matrix and the Δ-method for the component probabilities and the variance parameters

As the convergence of the two algorithms depends on the choice of the initial values, we fitted the model by the two approaches with the same grid of 32 sets of initial values. The sets differed on all kinds of parameters. HETNLMIXED provided the global maximum 11 times out of the 32 tries and our program found the global maximum 23 times. HETMIXLIN was also faster than the HETNLMIXED SAS-macro (several seconds compared with at least several minutes using a Bi-Xeon 3,06 GHz 1024 MB RAM).

The results for the model with three components of mixture are also shown in Table 1, but we cannot compare our results with those given by HETNLMIXED since it converges toward a non-positive definite matrix. Indeed, HETNLMIXED uses the NLMIXED procedure which does not constrain D to be positive definite.

4.2 Cognitive decline in the elderly

The second example illustrates the use of heterogeneous linear mixed models with our estimation method on a large data set as it can be encountered in epidemiological studies. The aim of this analysis is to describe, in a cohort of elderly subjects, the heterogeneity of the evolution of the Mini Mental State Examination (MMSE), the most important psychometric test to evaluate dementia and cognitive impairment, and to compare the classification of subjects stemmed from the mixture model with the dementia diagnosis. The MMSE score ranges from 0 to a maximum of 30 points.

Data come from the French prospective cohort study PAQUID initiated in 1988 to study normal and pathological ageing [19]. The cohort includes 3,777 subjects of 65 years and older who lived at home in southwestern France at baseline. Subjects were interviewed at baseline and were seen again 1 (T1), 3 (T3), 5 (T5), 8 (T8) and 10 (T10) years after the baseline visit (T0). At each visit, a battery of psychometric tests was completed and a diagnosis of dementia was carried out. In this analysis, we excluded data from T0 because of a learning effect previously described for the cognitive tests between T0 and T1 [20]. We studied the evolution of the MMSE between T1 and T8 for subjects free of dementia till T5 and compared the estimated classification with the dementia diagnosis at T8 and then with the health status at T10. We excluded subjects not seen at T8 to ensure that we had their diagnosis at this visit. This leads to a sample of 1,392 subjects having between 1 and 4 measures of the MMSE between T1 and T8.

The model is a quadratic function of time adjusted on covariates associated with cognitive evolution in order to exclude heterogeneity introduced by known factors. The time (t_ij for subject i at visit j) is the negative time between the measurement and the visit at T8 (time is zero for diagnosis time at T8). We model the square root of the number of errors to satisfy the normality assumption of the error terms. The model is written as:

Y_{i j} = \sqrt{30 - {MMSE}_{i j}} = β X_{i j} + u_{0 i} + u_{1 i} t_{i j} + u_{2 i} t_{i j}^{2} + ε_{i j}

(13)

where X_ij is the vector of covariates for subject i at visit j including age, occupation, educational level, living place and interactions with time for age and educational level; $u_{i} = (u_{0 i}, u_{1 i}, u_{2 i})^{'} \sim \sum_{g = 1}^{G} π_{g} N (μ_{g}, D)$ with μ_g = (μ₀_g, μ₁_g, μ₂_g)′ and ε_ij ~_iid N (0, σ²).

We fitted the heterogeneous linear mixed model with two components of mixture. The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were largely improved compared with the homogeneous mixed model (ΔAIC = 98.7; ΔBIC = 77.8). Table 2 displays the parameter estimates for the homogeneous and the heterogeneous linear mixed models obtained with HETMIXLIN. The heterogeneous linear mixed model distinguished two different MMSE courses (Figure 1). First, a large class including 98% of the sample follows a linear evolution with a slight decline of 0.42 points (σ = 0.15) per year. The second class including 2% of the sample follows a nonlinear evolution which speeds to fell down from the second follow-up to the end.

An external file that holds a picture, illustration, etc.
Object name is halms130037f1.jpg

Figure 1

Mean curves of the MMSE between T1 and T8 for the heterogeneous linear mixed model with two components of mixture for a 70-year-old worker subject with no education and living in Dordogne.

Solid line: first component with a probability of 98%

Dashed line: second component with a probability of 2%

Table 2

Estimates of the homogeneous linear mixed model and the heterogeneous linear mixed model with two components of mixture obtained with HETMIXLIN program for the MMSE evolution adjusted on age, occupation, educational level, living place and interaction with time for age and educational level.

	Homogeneous model (G = 1)		Heterogeneous model (G = 2)
Parameter	Estimate	SE	Estimate	SE
π₁	1		0.98	0.005
π₂			0.02
μ₀₁	−1.92	0.32	−1.51	0.31
μ₁₁	−0.57	0.16	−0.42	0.16
μ₂₁	−0.042	0.018	−0.028	0.024
μ₀₂			0.89	0.36
μ₁₂			0.46	0.19
μ₂₂			0.055	0.028
Var(u₀_i)	0.43	0.025	0.32	0.024
cov(u₀_i, u₁_i)	0.074	0.010	0.034	0.0095
Var(u₁_i)	0.066	0.014	0.029	0.013
cov(u₀_i, u₂_i)	0.042	0.0073	0.028	0.0069
cov(u₁_i, u₂_i)	0.052	0.010	0.038	0.010
Var(u₂_i)	0.073	0.016	0.060	0.016
σ	0.45	0.009	0.45	0.009

-L		4662.9		4609.5
AIC		9367.7		9269.0
BIG		9477.8		9400.0

Then, we tried to evaluate if the cognitive profiles highlighted by the model were associated with dementia diagnosis. Among the 1,392 subjects, 26 (1.9%) are classified in the second component with the nonlinear decline (Table 3). Among them, 21 have a positive dementia diagnosis at T8. The predictive positive value (81%) and the specificity (99.6%) of this classification are high but the sensitivity is poor (31%): 47 subjects among the 68 subjects diagnosed as demented at T8 are not detected by the model. These subjects are significantly less disabled (p < 0.0001) than the other demented subjects (23% of disabled people vs. 76%) and have a significantly lower educational level (p = 0.001): 51% have no education or no diploma from primary school vs. 9.5% in the demented group detected in the model. All of the 5 subjects without dementia but classified in the declining group are disabled, 2 die in the two years and 1 has a positive dementia diagnosis at T10.

Table 3

Relationship between classification stemmed from the heterogeneous linear mixed model with two components and dementia diagnosis at T8.

	Classification
Dementia diagnosis at T8	Linear class	Nonlinear class	Total
Positive	47	21	68
Negative	1319	5	1324
Total	1366	26	1392

The association between cognitive ageing and educational level is an important issue [21]: educational level could have a different effect on normal cognitive ageing and on the decline before dementia. As HETMIXLIN allows to specify distinct parameters for the covariates per component, we fitted the model where the interactions between educational level and time (educational level ×t and educational level ×t²) were different according to the components. The log-likelihood was not improved enough for significance (Δ(−2L) = 3.4; p = 0.17). Thus the association between educational level and cognitive evolution appears to be similar in the two subpopulations. Moreover, the discrimination of demented people was not improved with this model.

Due to limitations of HETNLMIXED explained in section 1, we did not compare our approach with this program on the PAQUID data set. However, as Newton-Raphson algorithms have been criticized on their global convergence behavior compared with the EM algorithm [8], we compared the convergence performances of our program with those of an EM algorithm we developed in Fortran90. Thus the comparison was free from the limitations due to SAS environment and to the use of the NLMIXED procedure. The EM algorithm we developed uses a Marquardt optimization in the M-step (convergence criteria: ε_a = 10⁻², ε_b = 10⁻² and ε_d = 10⁻³) and the global convergence is reached when two successive calculations of the likelihood differ less than 10⁻⁸. This algorithm was tested using the schoolgirls data. It was faster and converged more often than HETNLMIXED: the global maximum was reached 22 times out of the 32 tries with a mean computional time around 30 seconds (versus 11 times out of the 32 tries in at least several minutes for HETNLMIXED). On the PAQUID data set, HETMIXLIN and the EM algorithm we implemented led to the same parameter estimates. Among the 15 sets of initial values, the two programs provided the global maximum an equivalent number of times (9 times for HETMIXLIN vs. 10 times for the EM algorithm) but HETMIXLIN was much faster: the CPU time was less than 10 minutes for HETMIXLIN and more than 2 hours for the EM algorithm.

5 Availability of the program and hardware specification

The program HETMIXLIN is written in Fortran90 and all the subroutines needed in the program are provided. The Fortran source code HETMIXLIN.f, an example of HETMIXLIN.inf, a documentation HETMIXLIN.pdf and the example data file for the schoolgirls are available at no charge on the web site: http://www.isped.u-bordeaux2.fr.

Two versions are provided on the web site: one for Unix and one for Windows. The version for Windows includes an executable file (a DOS application) and does not need any Fortran90 compiler whereas the version for Unix needs to be compiled. The Unix version has been tested using an Intel Fortran Compiler for Linux version 7 or 8, a Fortran90 Compaq compiler for Alpha and a Forte Developer 6 update 2 on Solaris SPARC. Examples of the compilation command are given in the documentation HETMIXLIN.pdf.

6 Conclusion

We proposed in this paper a Newton-Raphson like algorithm to estimate heterogeneous linear mixed models. The main advantages of Newton-Raphson like algorithms are the speed of convergence, the availability of good convergence criteria based on the derivatives of the likelihood and direct estimates of the variance of the parameters via the inverse of the Hessian matrix. Moreover, using a simple modification of the Marquardt algorithm, we ensure the monotonicity of the algorithm which is considered as a main advantage of the EM algorithm [8].

We compared our program HETMIXLIN with a SAS-macro developed by Spiessens et al using an EM algorithm. This SAS-macro allows to estimate heterogeneous generalized linear mixed models, but when the model is linear, this macro has the drawback of computing numerically an integral across the random effects while it has a closed form. Our algorithm HETMIXLIN allows to estimate more complex linear models (models with a larger number of mixture components, a larger number of random effects and more covariate effects depending on the mixture components) and is suitable for much larger samples. Moreover, it converges faster.

This paper also illustrates the usefulness of heterogeneous linear mixed models on a study about cognitive ageing. These models allow to highlight various evolution profiles taking covariates into account. The cross-classification of the groups defined by the model and clinical events in the next years enables to evaluate whether the cognitive profiles are associated with different clinical evolutions.

As a conclusion, we hope this work will improve the availability and the use of heterogeneous linear mixed models.

Appendix 1: Extract of the Schoolgirl data file (two first subjects)

1	← identification number of the unit (subject)
5	←number of measures
111 116.4 121.7 126.3 130.5	←raw vector of the n_i responses
1 1 1 1 1	←raw vector of the first covariate
6 7 8 9 10	←raw vector of the second covariate
2	←identification number of the next unit (subject)
5	←number of measures
110 115.8 121.5 126.6 131.4	←raw vector of the n_i responses
1 1 1 1 1	←raw vector of the first covariate
6 7 8 9 10	←raw vector of the second covariate
3	←identification number of the next unit (subject)

Appendix 2: Example of the parameter file

An example of HETMIXLIN.inf used in the application about the height of schoolgirls is given below. The user should notice that each asked piece of information is preceded by a line summing it up.

→ Filename for the data

schoolgirls.txt

→Filename for the output

girls.out

→Title of the procedure (in inverted commas)

‘G=2: school girls’

→Number of units (subjects)

20

→Number of mixture components (G) and, if and only if G>1, the initial values for the G-1 first component probabilities below and the filename for the posterior probabilities below again

2

0.5

p.out

→Number of explanatory variables (including the intercept) in the data file

2

→Indicator that the explanatory variable is in the model (1 if present 0 if not)

1 1

→Indicator of random effect for each variable in the model (variables included in Z1 or Z2)

1 1

→Indicator of mixture for each variable in the model (variables included in X2 or Z2)

1 1

→Initial values for fixed effects. First, initial values for common fixed effects (without mixture) in the same order as in the datafile, then initial values for the covariates with a mixture (G values per covariate). ex: b1 b3 b21 b22 b23 b41 b42 b43 for a mixture on the second and the fourth covariate and G=3

86 80 5 7

→Indicator of the random-effect covariance matrix structure (0 if unstructured matrix/1 if diagonal matrix)

0

→Initial values for the variance-covariance parameters of the random effects (1/2 superior matrix column by column)

3 1 1

→Initial value for the variance of the independent Gaussian errors

1

References

1. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–74. [Abstract] [Google Scholar]

2. Verbeke G, Lesaffre E. A linear mixed-effects model with heterogeneity in the random-effects population. JASA. 1996;91:217–21. [Google Scholar]

3. Spiessens B, Verbeke G, Komárek A. A SAS-macro for the classification of longitudinal profiles using mixtures of normal distributions in nonlinear and generalized linear models. 2002. http://www.med.kuleuven.ac.be/biostat/research/software.htm.

4. Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using a EM algorithm. Biometrics. 1999;55:463–9. [Abstract] [Google Scholar]

5. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm (with discussion) J Roy Statis Soc Ser B. 1977;39:1–38. [Google Scholar]

6. Komárek A. A SAS-macro for linear mixed models with a finite normal mixture as random-effects distribution. 2001. http://www.med.kuleuven.ac.be/biostat/research/software.htm.

7. Linstrom MJ, Bates DM. Newton-Raphson and EM algorithms for linear mixed models for repeated-measures data. JASA. 1988;83:1014–22. [Google Scholar]

8. Redner RA, Walker HF. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review. 1984;26:195–239. [Google Scholar]

9. Marquardt D. An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math. 1963;11:431–41. [Google Scholar]

10. Fletcher R. Practical methods of optimization. 2. chap. 3. John Wiley & Sons; 2000. [Google Scholar]

11. Knight K. Mathematical Statistics. chapt. 3. Chapman &; Hall/CRC; 2000. p. 5. [Google Scholar]

12. Karlis D, Xekalaki E. Choosing initial values for the EM algorithm for finite mixtures. Comput Statist Data Anal. 2003;41:577–90. [Google Scholar]

13. Schlattmann P. Estimating the number of components in a finite mixture model: the special case of homogeneity. Comput Statist Data Anal. 2003;41:441–51. [Google Scholar]

14. Akaike H. A new look at the statistical model identification. IEEE trans automat, control. 1974;19:716–23. [Google Scholar]

15. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–64. [Google Scholar]

16. Goldstein H. The design and analysis of longitudinal studies. London: Academic press; 1979. [Google Scholar]

17. McLachlan GJ, Krishnan T. The EM algorithm and extensions. John Wiley & Sons; 1997. [Google Scholar]

18. Louis TA. Finding the Observed Information Matrix when Using the EM algorithm. J Roy Statis Soc Ser B. 1982;44:226–33. [Google Scholar]

19. Letenneur L, Commenges D, Dartigues JF, Barberger-Gateau P. Incidence of dementia and Alzheimer’s disease in elderly community residents of southwestern France. Int J Epidemiol. 1994;23:1256–61. [Abstract] [Google Scholar]

20. Jacqmin-Gadda H, Fabrigoule C, Commenges D, Dartigues JF. A 5-year longitudinal study of the Mini Mental State Examination in normal aging. Am J Epidemiol. 1997;145:498–506. [Abstract] [Google Scholar]

21. Letenneur L, Gilleron V, Commenges D, Helmer C, Orgogozo JM, Dartigues JF. Are sex and educational level independent predictors of dementia and Alzheimer’s disease? Incidence data from the PAQUID project. J Neurol Neurosurg Psychiatry. 1999;66:177–83. [Europe PMC free article] [Abstract] [Google Scholar]

Full text links

Read article at publisher's site: https://doi.org/10.1016/j.cmpb.2004.12.004

Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc1913221

HAL Open Archive
http://www.hal.inserm.fr/inserm-00130037

Citations & impact

Impact metrics

Citations

Jump to Citations

Citations of article over time

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1016/j.cmpb.2004.12.004

Supporting

Mentioning

Contrasting

Article citations

Association between triglyceride-glucose index trajectories and radiofrequency ablation outcomes in patients with stage 3D atrial fibrillation.
Jia S, Yin Y, Mou X, Zheng J, Li Z, Hu T, Zhao J, Lin J, Song J, Cheng F, Wang Y, Li K, Lin W, Feng C, Ge W, Xia S
Cardiovasc Diabetol, 23(1):121, 05 Apr 2024
Cited by: 0 articles | PMID: 38581024 | PMCID: PMC10998403
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Understanding factors associated with the trajectory of subjective cognitive complaints in groups with similar objective cognitive trajectories.
Cacciamani F, Bercu A, Bouteloup V, Grasset L, Planche V, Chêne G, Dufouil C, MEMENTO Cohort Study Group
Alzheimers Res Ther, 15(1):205, 22 Nov 2023
Cited by: 0 articles | PMID: 37993894 | PMCID: PMC10666380
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Cancer care patterns in South Korea: Types of hospital where patients receive care and outcomes using national health insurance claims data.
Choi DW, Kim SJ, Kim S, Kim DW, Jeong W, Han KT
Cancer Med, 12(13):14707-14717, 18 May 2023
Cited by: 2 articles | PMID: 37199387 | PMCID: PMC10358188
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Utilizing Residential History to Examine Heterogeneous Exposure Trajectories: A Latent Class Mixed Modeling Approach Applied to Mesothelioma Patients.
Liu B, Lee FF
J Registry Manag, 50(4):144-154, 01 Jan 2023
Cited by: 0 articles | PMID: 38504699 | PMCID: PMC10945925
Free full text in Europe PMC
Applied machine learning to identify differential risk groups underlying externalizing and internalizing problem behaviors trajectories: A case study using a cohort of Asian American children.
Adhikari S, You S, Chen A, Cheng S, Huang KY
PLoS One, 18(3):e0282235, 03 Mar 2023
Cited by: 0 articles | PMID: 36867610 | PMCID: PMC9983857
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (35) article citations

Search life-sciences literature (44,009,532 articles, preprints and more)

Estimation of linear mixed models with a mixture of distribution for the random effects.

Author information

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

Estimation of linear mixed models with a mixture of distribution for the random effects

Abstract

1 Introduction

2 Computational methods and theory

2.1 The heterogeneous linear mixed model

2.2 Likelihood

2.3 Estimation procedure

2.4 A posteriori classification

3 Program description

4 Applications

4.1 The height of schoolgirls

Table 1

4.2 Cognitive decline in the elderly

Table 2

Table 3

5 Availability of the program and hardware specification

6 Conclusion

Appendix 1: Extract of the Schoolgirl data file (two first subjects)

Appendix 2: Example of the parameter file

References

Full text links

Citations & impact

Impact metrics

Citations of article over time

Article citations

Similar Articles