Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


The aim of this paper is to propose an algorithm to estimate linear mixed model when random effect distribution is a mixture of Gaussians. This heterogeneous linear mixed model relaxes the classical Gaussian assumption for the random effects and, when used for longitudinal data, can highlight distinct patterns of evolution. The observed likelihood is maximized using a Marquardt algorithm instead of the EM algorithm which is frequently used for mixture models. Indeed, the EM algorithm is computationally expensive and does not provide good convergence criteria nor direct estimates of the variance of the parameters. The proposed method also allows to classify subjects according to the estimated profiles by computing posterior probabilities of belonging to each component. The use of heterogeneous linear mixed model is illustrated through a study of the different patterns of cognitive evolution in the elderly. HETMIXLIN is a free Fortran90 program available on the web site: http://www.isped.u-bordeaux2.fr.

Free full text 


Logo of halLink to Publisher's site
Comput Methods Programs Biomed. Author manuscript; available in PMC 2007 Jul 10.
Published in final edited form as:
PMCID: PMC1913221
HALMS: HALMS130037
PMID: 15848271

Estimation of linear mixed models with a mixture of distribution for the random effects

Abstract

The aim of this paper is to propose an algorithm to estimate linear mixed model when random effect distribution is a mixture of Gaussians. This heterogeneous linear mixed model relaxes the classical Gaussian assumption for the random effects and, when used for longitudinal data, can highlight distinct patterns of evolution. The observed likelihood is maximized using a Marquardt algorithm instead of the EM algorithm which is frequently used for mixture models. Indeed, the EM algorithm is computationally expensive and does not provide good convergence criteria nor direct estimates of the variance of the parameters. The proposed method also allows to classify subjects according to the estimated profiles by computing posterior probabilities of belonging to each component. The use of heterogeneous linear mixed model is illustrated through a study of the different patterns of cognitive evolution in the elderly. HETMIXLIN is a free Fortran90 program available on the web site: http://www.isped.u-bordeaux2.fr.

Keywords: Aged, Aging, Algorithms, Child, Cognition, Computers, Female, Humans, Linear Models, Longitudinal Studies, Normal Distribution, Software
Keywords: Heterogeneous mixed model, Mixture model, Longitudinal data, Newton-Raphson-like algorithm, Cognitive ageing

1 Introduction

Many longitudinal studies consist in assessing the changes over the time of a marker measured repeatedly on each participant. These analyses are generally performed with mixed models [1] which allow to take into account the within-subject correlation and the variability of the marker course between the subjects. However such a model is based on the strong assumption that the random effects are sampled from a single multivariate Gaussian distribution which means that the marker course is homogeneous among all the subjects.

To assess this assumption, Verbeke and Lesaffre [2] have proposed a mixed model with a mixture of multivariate Gaussians on the random effects. This heterogeneous linear mixed model allows to relax the normality assumption for the random effects and also allows to highlight distinct evolutions of the marker and classify the subjects according to these different patterns of evolution.

In the Verbeke and Lesaffre’s work as in more recent papers [3,4], the mixed models with a mixture on the random effects distribution were estimated using an EM algorithm [5]. For instance, Spiessens and Verbeke [3] recently proposed a free SAS-macro (HETNLMIXED) using the EM algorithm and the NLMIXED procedure for the optimization in the M step. This SAS-macro is an extension of the SAS-macro HETMIXED which was developed earlier for estimating heterogeneous linear mixed models using the MIXED procedure [6]. To our knowledge, HETNLMIXED and its first version HETMIXED are the only free available programs developed for estimating heterogeneous mixed models. The first version HETMIXED was proved to be very slow and limited to small samples due to very large matrices handling and prohibitive computation; it will not be expanded in this work. HETNLMIXED was developed to reduce these computational problems and to allow estimation of both linear and generalized linear models. However, in the linear case, this SAS-macro has the drawback of computing numerically an integral across the random effects while it has a closed form, and thus the macro is limited to a small number of random-effects. We have also observed convergence problems when using the macro with large samples except for very simple models.

Moreover, the EM algorithm, which is used in these macros, has some general drawbacks. In particular it does not have any good convergence criteria; the convergence is only built on a lack of progression of the likelihood or the parameter estimates [7]. Furthermore, the convergence is slow [8] and the EM algorithm does not provide direct estimates of the variance of the parameters. In the particular case of an heterogeneous mixed model, the M step also requires the estimation of an homogeneous mixed model which is computationally expensive.

Therefore the first aim of this paper is to propose a program for estimating more general heterogeneous linear mixed models suitable for large samples. The proposed program HETMIXLIN is written in Fortran90 and uses a direct maximization of the likelihood via a Marquardt optimization algorithm. The second objective of this paper is to illustrate the use of heterogeneous linear mixed model through a study of the different patterns of evolution in cognitive ageing.

2 Computational methods and theory

2.1 The heterogeneous linear mixed model

Let Yi = (Yi1,…, Yini) be the response vector for the ni measurements of the subject i with i = 1,…, N. The linear mixed model [1] for the response vector Yi is defined as:

Yi=Xiβ+Ziui+εi
(1)

Xi is a ni × p design matrix for the p-vector of fixed effects β, and Zi is a ni × q design matrix associated to the q-vector of random effects ui which represents the subject specific regression coefficients. The errors εi are assumed to be normally distributed with mean zero and covariance matrix σ2Ini, and are assumed to be independent from the vector of random effects ui.

In an homogeneous mixed model [1], ui is normally distributed with mean μ and covariance matrix D i.e.

uiN(μ,D)
(2)

In the heterogeneous mixed model [24], ui is assumed to follow a mixture of G multivariate Gaussians with different means (μg)g=1,G and a common covariance matrix D i.e.

uig=1GπgN(μg,D)
(3)

Each component g of the mixture has a probability πg and the (πg)g=1,G verify the following conditions:

0πg1g=1,Gand g=1Gπg=1
(4)

In this work, we propose a slightly more general formulation of the model described in (1) in which the effect of some covariates may depend on the components of mixture and some of the random effects may have a common mean whatever the component of mixture. Thus, the Xi design matrix is split in X1i associated with the vector β of fixed effects which are common to all the components and X2i associated with the vectors γg of fixed effects which are specific to the components. The Zi design matrix is also splitted in Z1i associated with the vector vi of random effects following a single Gaussian distribution and Z2i associated with the vector ui of random effects following a mixture of Gaussian distributions. The model is then written as:

Yi=X1iβ+g=1GπgX2iδg+Z1ivi+Z2iui+εi
(5)

where vi ~ N(0, Dv) and uig=1GπgN(μg,Du); given the component g, the conditional distribution of the vector (viui) is N((0μg),D) with D=(DvDvuDuvDu).

2.2 Likelihood

Following the previous works [3,4], we define wig the unobserved variable indicating if the subject i belongs to the component g. We have P(wig = 1) = πg. The density for the vector yi can then be written as:

fi(yi)=g=1Gπgf(yiwig=1)
(6)

Given wig, yi follows a linear mixed model, and the density f(yi|wig = 1) denoted by ([var phi]ig is the multivariate Gaussian density with mean Eig and covariance matrix Vi given by:

Eig=E(Yiwig=1)=X1iβ+X2iδg+Z2iμgandVi=Var(Yiwig=1)=ZiDZi+σ2Ini
(7)

Let now θ be the vector of the m parameters of the model. θ contains ψ with ψ=(β,(δg)g=1,G,(μg)g=1,G,Vec(D),σ2) and π the vector of the G − 1 first component probabilities (πg)g=1,G−1. Note that πg is entirely determined by π as 1-g=1G-1πg. Vec(D) represents the vector of the upper triangular elements of D. The estimates of θ are obtained as the vector [theta w/ hat] that maximizes the observed log-likelihood:

L(Y;θ)=i=1Nln(fi(yi))=i=1Nln(g=1Gπgφig(yi))=i=1N-ni2ln(2π)-12ln(Vi)+ln(g=1Gπge-12(Yi-Eig)Vi-1(Yi-Eig))
(8)

2.3 Estimation procedure

We propose to maximize directly the observed log-likelihood (8) using a modified Marquardt optimization algorithm [9], a Newton-Raphson like algorithm [10]. The diagonal of the Hessian at iteration k, H(k), is inflated to obtain a positive definite matrix as: H(k)=(Hij(k)) with Hii(k)=Hii(k)+λ[(1-η)Hii(k)+ηtr(H)] and Hij(k)=Hij(k) if ij. Initial values for λ and η are λ = 0.01 and η = 0.01. They are reduced when H* is positive definite and increased if not. The estimates θ(k) are then updated to θ(k+1) using the current modified Hessian H*(k) and the current gradient of the parameters g(θ(k)) according to the formula:

θ(k+1)=θ(k)-αH(k)-1g(θ(k))
(9)

where, if necessary, α is modified to ensure that the log-likelihood is improved at each iteration.

To ensure that the covariance matrix D is positive, we maximize the log-likelihood on the non zero elements of U, the Cholesky factor of D (i.e. UU = D) [7]. Furthermore, to deal with the constraints on π (4) we use the transformed parameters (γg)g=1,G−1 with:

γg=ln(πgπG)
(10)

Standard errors of the elements of D and (πg)g=1,G−1 are computed by the Δ-method [11] while standard errors of the other parameters are directly computed using the inverse of the observed Hessian matrix.

The convergence is reached when the three following convergence criteria are satisfied: j=1m(θj(k)-θj(k-1))2εa, |L(k)L(k−1)| ≤ εb and g(θ(k))′H(k)−1g(θ(k)) ≤ εd. The default values are εa = 10−5, εb = 10−5 and εd = 10−8.

As the log-likelihood of a mixture model may have several maxima [8], we use a grid of initial values to find the global maximum. The multimodality of the log-likelihood in mixture models has been often discussed and some authors proposed different strategies to choose the set of initial values [12]. However, none of them seems to be optimal in a general way. We have observed, in our experience, that the results were mainly sensitive to initial values of (πg)g=1,G−1 and (μg)g=1,G and less sensitive to the other parameters (Vec(U), β and σ) for which estimates of the homogeneous mixed models were good initial values.

A mixture model is estimated with a fixed number of components G, otherwise the number of parameters in the model is unknown. To choose the right number of components, one has to estimate models with different values for G and select the best model according to a test or a criterion. Some works favor a bootstrap approach to approximate the asymptotic distribution of the likelihood ratio test between models with different number of components [13] but this approach is very heavy in particular for mixture models with random effects. Criteria such as Akaike’s Information Criterion (AIC) [14] or Bayesian Information Criterion (BIC) [15] are often preferred. We use these selection criteria to select the optimal number of components.

2.4 A posteriori classification

After parameter estimation, mixture models allow to classify subjects according to the G components. The classification is based on the posterior probabilities (πig)g=1,G that the subject i follows each of the G components. Using [theta w/ hat] = ([psi]′, [pi]′)′, these probabilities are obtained by the Bayes theorem [24] as:

π^ig=P(wig=1Yi,θ^)=π^gφig(ψ^,Yi)g=1Gπ^gφig(ψ^,Yi)
(11)

We then assign to each subject i the component to which he has the highest probability (πig)g=1,G to belong.

3 Program description

The program requires two distinct input files: the data file described in appendix 1 and the parameter file named HETMIXLIN.inf which contains the information needed for the estimation of the model: the names of the data file and output files, the number of subjects, the description of the model (number of components G, dimension of the random effects, covariates X1, X2, Z1 and Z2 and covariance structure of D) and the initial values of the parameters. An example of the parameter file is given in appendix 2.

The main output file gives the final log-likelihood, the AIC, the BIC, the convergence criteria, the number of iterations and the parameter estimates with the standard errors, the Wald statistics and the 95% confidence interval. The number of subjects classified in each component is also given.

Finally, another output file contains the posterior probabilities for each subject to belong to each class and the final class membership.

4 Applications

4.1 The height of schoolgirls

We consider the sample of 20 preadolescent schoolgirls introduced by Goldstein [16]. Verbeke et al [2] and Komárek et al [6] modelled the growth curves of their height according to age from 6 to 10. They showed in the homogeneous mixed model that the height course of girls differed significantly according to the category of height of their mother (small, medium and tall). Thus they used the heterogeneous linear mixed model without introducing the height of the mother in the model to try to highlight clusters with distinct growth curves among the girls. In this work, we compare the results obtained using our program to those obtained with the HETNLMIXED SAS-macro which uses the EM algorithm. The model is written as:

Heightij=u0i+u1i×ageij+εij
(12)

where ui=(u0i,u1i)g=1GπgN(μg,D) with μg = (μ0g, μ1g)′ and εij ~iid N (0, σ2)

We fitted the heterogeneous linear mixed model for two and three components. An extract of the data file and the parameter file for the model with two components are presented in appendices 1 and 2. The results for the model with two components of mixture obtained with our program HETMIXLIN and the SAS-macro HETNLMIXED are shown in Table 1. The estimates obtained using the two methods are the same but a difference is observed in the standard error estimates; the standard errors estimates from HETNLMIXED seem bigger than those from HETMIXLIN. This difference in the standard error estimates from the two algorithms was also observed in the homogeneous case comparing HETNLMIXED, the MIXED procedure, the NLMIXED procedure and HETMIXLIN program. In the three latter programs, standard errors are estimated by the inverse of the Hessian matrix which estimates the Fisher Information matrix [11] and led to the same standard error estimates. By contrast, HETNLMIXED uses an approximation of the Louis’s method based on the product of the expectations of the gradient of the complete likelihood [17], the Louis’s method [18] being itself an approximation of the observed Hessian matrix. This method appeared to overestimate standard errors in this small sample. However, in our experience, this approximation of the observed Hessian matrix seemed to be improved when the sample size increased. For instance, using a linear mixed model estimated on the 1,392 subjects of the PAQUID sample from section 4.2, the discrepancy was lower.

Table 1

Estimates and standard-errors of the heterogeneous linear mixed model with two components of mixture for the height of schoolgirls using HETMIXLIN (the proposed direct maximisation using a Marquardt algorithm) and HETNLMIXED (Spiessens et al SAS-macro using an EM algorithm) and estimates of the heterogeneous linear mixed model with three components of mixture using HETMIXLIN

G = 2 HETNLMIXEDG =2 HETMIXLING =3 HETMIXLIN
ParameterEstimateSE*EstimateSE**EstimateSE**
π10.680.140.680.120.500.18
π20.320.320.300.11
π30.20
μ0182.81.1282.80.9184.21.18
μ115.380.0915.380.0865.320.10
μ281.92.0181.91.5281.71.12
μ126.440.186.440.156.470.12
μ0379.42.19
μ135.600.20
Var(u0i)6.474.946.473.133.502.39
cov(u0i, u1i)0.130.400.130.350.320.13
Var(u1i)0.0340.0560.0340.0300.0300.024
σ0.690.100.690.0630.680.06

-2L166.67165.94
AIC351.35355.87
BIG360.32367.82
*Standard Errors obtained using Louis’s method
**Standard Errors obtained using the inverse of the Hessian matrix and the Δ-method for the component probabilities and the variance parameters

As the convergence of the two algorithms depends on the choice of the initial values, we fitted the model by the two approaches with the same grid of 32 sets of initial values. The sets differed on all kinds of parameters. HETNLMIXED provided the global maximum 11 times out of the 32 tries and our program found the global maximum 23 times. HETMIXLIN was also faster than the HETNLMIXED SAS-macro (several seconds compared with at least several minutes using a Bi-Xeon 3,06 GHz 1024 MB RAM).

The results for the model with three components of mixture are also shown in Table 1, but we cannot compare our results with those given by HETNLMIXED since it converges toward a non-positive definite D matrix. Indeed, HETNLMIXED uses the NLMIXED procedure which does not constrain D to be positive definite.

4.2 Cognitive decline in the elderly

The second example illustrates the use of heterogeneous linear mixed models with our estimation method on a large data set as it can be encountered in epidemiological studies. The aim of this analysis is to describe, in a cohort of elderly subjects, the heterogeneity of the evolution of the Mini Mental State Examination (MMSE), the most important psychometric test to evaluate dementia and cognitive impairment, and to compare the classification of subjects stemmed from the mixture model with the dementia diagnosis. The MMSE score ranges from 0 to a maximum of 30 points.

Data come from the French prospective cohort study PAQUID initiated in 1988 to study normal and pathological ageing [19]. The cohort includes 3,777 subjects of 65 years and older who lived at home in southwestern France at baseline. Subjects were interviewed at baseline and were seen again 1 (T1), 3 (T3), 5 (T5), 8 (T8) and 10 (T10) years after the baseline visit (T0). At each visit, a battery of psychometric tests was completed and a diagnosis of dementia was carried out. In this analysis, we excluded data from T0 because of a learning effect previously described for the cognitive tests between T0 and T1 [20]. We studied the evolution of the MMSE between T1 and T8 for subjects free of dementia till T5 and compared the estimated classification with the dementia diagnosis at T8 and then with the health status at T10. We excluded subjects not seen at T8 to ensure that we had their diagnosis at this visit. This leads to a sample of 1,392 subjects having between 1 and 4 measures of the MMSE between T1 and T8.

The model is a quadratic function of time adjusted on covariates associated with cognitive evolution in order to exclude heterogeneity introduced by known factors. The time (tij for subject i at visit j) is the negative time between the measurement and the visit at T8 (time is zero for diagnosis time at T8). We model the square root of the number of errors to satisfy the normality assumption of the error terms. The model is written as:

Yij=30-MMSEij=βXij+u0i+u1itij+u2itij2+εij
(13)

where Xij is the vector of covariates for subject i at visit j including age, occupation, educational level, living place and interactions with time for age and educational level; ui=(u0i,u1i,u2i)g=1GπgN(μg,D) with μg = (μ0g, μ1g, μ2g)′ and εij ~iid N (0, σ2).

We fitted the heterogeneous linear mixed model with two components of mixture. The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were largely improved compared with the homogeneous mixed model (ΔAIC = 98.7; ΔBIC = 77.8). Table 2 displays the parameter estimates for the homogeneous and the heterogeneous linear mixed models obtained with HETMIXLIN. The heterogeneous linear mixed model distinguished two different MMSE courses (Figure 1). First, a large class including 98% of the sample follows a linear evolution with a slight decline of 0.42 points (σ = 0.15) per year. The second class including 2% of the sample follows a nonlinear evolution which speeds to fell down from the second follow-up to the end.

An external file that holds a picture, illustration, etc.
Object name is halms130037f1.jpg

Mean curves of the MMSE between T1 and T8 for the heterogeneous linear mixed model with two components of mixture for a 70-year-old worker subject with no education and living in Dordogne.

Solid line: first component with a probability of 98%

Dashed line: second component with a probability of 2%

Table 2

Estimates of the homogeneous linear mixed model and the heterogeneous linear mixed model with two components of mixture obtained with HETMIXLIN program for the MMSE evolution adjusted on age, occupation, educational level, living place and interaction with time for age and educational level.

Homogeneous model (G = 1)Heterogeneous model (G = 2)
ParameterEstimateSEEstimateSE
π110.980.005
π20.02
μ01−1.920.32−1.510.31
μ11−0.570.16−0.420.16
μ21−0.0420.018−0.0280.024
μ020.890.36
μ120.460.19
μ220.0550.028
Var(u0i)0.430.0250.320.024
cov(u0i, u1i)0.0740.0100.0340.0095
Var(u1i)0.0660.0140.0290.013
cov(u0i, u2i)0.0420.00730.0280.0069
cov(u1i, u2i)0.0520.0100.0380.010
Var(u2i)0.0730.0160.0600.016
σ0.450.0090.450.009

-L4662.94609.5
AIC9367.79269.0
BIG9477.89400.0

Then, we tried to evaluate if the cognitive profiles highlighted by the model were associated with dementia diagnosis. Among the 1,392 subjects, 26 (1.9%) are classified in the second component with the nonlinear decline (Table 3). Among them, 21 have a positive dementia diagnosis at T8. The predictive positive value (81%) and the specificity (99.6%) of this classification are high but the sensitivity is poor (31%): 47 subjects among the 68 subjects diagnosed as demented at T8 are not detected by the model. These subjects are significantly less disabled (p < 0.0001) than the other demented subjects (23% of disabled people vs. 76%) and have a significantly lower educational level (p = 0.001): 51% have no education or no diploma from primary school vs. 9.5% in the demented group detected in the model. All of the 5 subjects without dementia but classified in the declining group are disabled, 2 die in the two years and 1 has a positive dementia diagnosis at T10.

Table 3

Relationship between classification stemmed from the heterogeneous linear mixed model with two components and dementia diagnosis at T8.

Classification
Dementia diagnosis at T8Linear classNonlinear classTotal
Positive472168
Negative131951324
Total1366261392

The association between cognitive ageing and educational level is an important issue [21]: educational level could have a different effect on normal cognitive ageing and on the decline before dementia. As HETMIXLIN allows to specify distinct parameters for the covariates per component, we fitted the model where the interactions between educational level and time (educational level ×t and educational level ×t2) were different according to the components. The log-likelihood was not improved enough for significance (Δ(−2L) = 3.4; p = 0.17). Thus the association between educational level and cognitive evolution appears to be similar in the two subpopulations. Moreover, the discrimination of demented people was not improved with this model.

Due to limitations of HETNLMIXED explained in section 1, we did not compare our approach with this program on the PAQUID data set. However, as Newton-Raphson algorithms have been criticized on their global convergence behavior compared with the EM algorithm [8], we compared the convergence performances of our program with those of an EM algorithm we developed in Fortran90. Thus the comparison was free from the limitations due to SAS environment and to the use of the NLMIXED procedure. The EM algorithm we developed uses a Marquardt optimization in the M-step (convergence criteria: εa = 10−2, εb = 10−2 and εd = 10−3) and the global convergence is reached when two successive calculations of the likelihood differ less than 10−8. This algorithm was tested using the schoolgirls data. It was faster and converged more often than HETNLMIXED: the global maximum was reached 22 times out of the 32 tries with a mean computional time around 30 seconds (versus 11 times out of the 32 tries in at least several minutes for HETNLMIXED). On the PAQUID data set, HETMIXLIN and the EM algorithm we implemented led to the same parameter estimates. Among the 15 sets of initial values, the two programs provided the global maximum an equivalent number of times (9 times for HETMIXLIN vs. 10 times for the EM algorithm) but HETMIXLIN was much faster: the CPU time was less than 10 minutes for HETMIXLIN and more than 2 hours for the EM algorithm.

5 Availability of the program and hardware specification

The program HETMIXLIN is written in Fortran90 and all the subroutines needed in the program are provided. The Fortran source code HETMIXLIN.f, an example of HETMIXLIN.inf, a documentation HETMIXLIN.pdf and the example data file for the schoolgirls are available at no charge on the web site: http://www.isped.u-bordeaux2.fr.

Two versions are provided on the web site: one for Unix and one for Windows. The version for Windows includes an executable file (a DOS application) and does not need any Fortran90 compiler whereas the version for Unix needs to be compiled. The Unix version has been tested using an Intel Fortran Compiler for Linux version 7 or 8, a Fortran90 Compaq compiler for Alpha and a Forte Developer 6 update 2 on Solaris SPARC. Examples of the compilation command are given in the documentation HETMIXLIN.pdf.

6 Conclusion

We proposed in this paper a Newton-Raphson like algorithm to estimate heterogeneous linear mixed models. The main advantages of Newton-Raphson like algorithms are the speed of convergence, the availability of good convergence criteria based on the derivatives of the likelihood and direct estimates of the variance of the parameters via the inverse of the Hessian matrix. Moreover, using a simple modification of the Marquardt algorithm, we ensure the monotonicity of the algorithm which is considered as a main advantage of the EM algorithm [8].

We compared our program HETMIXLIN with a SAS-macro developed by Spiessens et al using an EM algorithm. This SAS-macro allows to estimate heterogeneous generalized linear mixed models, but when the model is linear, this macro has the drawback of computing numerically an integral across the random effects while it has a closed form. Our algorithm HETMIXLIN allows to estimate more complex linear models (models with a larger number of mixture components, a larger number of random effects and more covariate effects depending on the mixture components) and is suitable for much larger samples. Moreover, it converges faster.

This paper also illustrates the usefulness of heterogeneous linear mixed models on a study about cognitive ageing. These models allow to highlight various evolution profiles taking covariates into account. The cross-classification of the groups defined by the model and clinical events in the next years enables to evaluate whether the cognitive profiles are associated with different clinical evolutions.

As a conclusion, we hope this work will improve the availability and the use of heterogeneous linear mixed models.

Appendix 1: Extract of the Schoolgirl data file (two first subjects)

1← identification number of the unit (subject)
5←number of measures
111 116.4 121.7 126.3 130.5←raw vector of the ni responses
1 1 1 1 1←raw vector of the first covariate
6 7 8 9 10←raw vector of the second covariate
2←identification number of the next unit (subject)
5←number of measures
110 115.8 121.5 126.6 131.4←raw vector of the ni responses
1 1 1 1 1←raw vector of the first covariate
6 7 8 9 10←raw vector of the second covariate
3←identification number of the next unit (subject)

Appendix 2: Example of the parameter file

An example of HETMIXLIN.inf used in the application about the height of schoolgirls is given below. The user should notice that each asked piece of information is preceded by a line summing it up.

→ Filename for the data

schoolgirls.txt

→Filename for the output

girls.out

→Title of the procedure (in inverted commas)

‘G=2: school girls’

→Number of units (subjects)

20

→Number of mixture components (G) and, if and only if G>1, the initial values for the G-1 first component probabilities below and the filename for the posterior probabilities below again

2

0.5

p.out

→Number of explanatory variables (including the intercept) in the data file

2

→Indicator that the explanatory variable is in the model (1 if present 0 if not)

1 1

→Indicator of random effect for each variable in the model (variables included in Z1 or Z2)

1 1

→Indicator of mixture for each variable in the model (variables included in X2 or Z2)

1 1

→Initial values for fixed effects. First, initial values for common fixed effects (without mixture) in the same order as in the datafile, then initial values for the covariates with a mixture (G values per covariate). ex: b1 b3 b21 b22 b23 b41 b42 b43 for a mixture on the second and the fourth covariate and G=3

86 80 5 7

→Indicator of the random-effect covariance matrix structure (0 if unstructured matrix/1 if diagonal matrix)

0

→Initial values for the variance-covariance parameters of the random effects (1/2 superior matrix column by column)

3 1 1

→Initial value for the variance of the independent Gaussian errors

1

References

1. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–74. [Abstract] [Google Scholar]
2. Verbeke G, Lesaffre E. A linear mixed-effects model with heterogeneity in the random-effects population. JASA. 1996;91:217–21. [Google Scholar]
3. Spiessens B, Verbeke G, Komárek A. A SAS-macro for the classification of longitudinal profiles using mixtures of normal distributions in nonlinear and generalized linear models. 2002. http://www.med.kuleuven.ac.be/biostat/research/software.htm.
4. Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using a EM algorithm. Biometrics. 1999;55:463–9. [Abstract] [Google Scholar]
5. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm (with discussion) J Roy Statis Soc Ser B. 1977;39:1–38. [Google Scholar]
6. Komárek A. A SAS-macro for linear mixed models with a finite normal mixture as random-effects distribution. 2001. http://www.med.kuleuven.ac.be/biostat/research/software.htm.
7. Linstrom MJ, Bates DM. Newton-Raphson and EM algorithms for linear mixed models for repeated-measures data. JASA. 1988;83:1014–22. [Google Scholar]
8. Redner RA, Walker HF. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review. 1984;26:195–239. [Google Scholar]
9. Marquardt D. An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math. 1963;11:431–41. [Google Scholar]
10. Fletcher R. Practical methods of optimization. 2. chap. 3. John Wiley & Sons; 2000. [Google Scholar]
11. Knight K. Mathematical Statistics. chapt. 3. Chapman &; Hall/CRC; 2000. p. 5. [Google Scholar]
12. Karlis D, Xekalaki E. Choosing initial values for the EM algorithm for finite mixtures. Comput Statist Data Anal. 2003;41:577–90. [Google Scholar]
13. Schlattmann P. Estimating the number of components in a finite mixture model: the special case of homogeneity. Comput Statist Data Anal. 2003;41:441–51. [Google Scholar]
14. Akaike H. A new look at the statistical model identification. IEEE trans automat, control. 1974;19:716–23. [Google Scholar]
15. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–64. [Google Scholar]
16. Goldstein H. The design and analysis of longitudinal studies. London: Academic press; 1979. [Google Scholar]
17. McLachlan GJ, Krishnan T. The EM algorithm and extensions. John Wiley & Sons; 1997. [Google Scholar]
18. Louis TA. Finding the Observed Information Matrix when Using the EM algorithm. J Roy Statis Soc Ser B. 1982;44:226–33. [Google Scholar]
19. Letenneur L, Commenges D, Dartigues JF, Barberger-Gateau P. Incidence of dementia and Alzheimer’s disease in elderly community residents of southwestern France. Int J Epidemiol. 1994;23:1256–61. [Abstract] [Google Scholar]
20. Jacqmin-Gadda H, Fabrigoule C, Commenges D, Dartigues JF. A 5-year longitudinal study of the Mini Mental State Examination in normal aging. Am J Epidemiol. 1997;145:498–506. [Abstract] [Google Scholar]
21. Letenneur L, Gilleron V, Commenges D, Helmer C, Orgogozo JM, Dartigues JF. Are sex and educational level independent predictors of dementia and Alzheimer’s disease? Incidence data from the PAQUID project. J Neurol Neurosurg Psychiatry. 1999;66:177–83. [Europe PMC free article] [Abstract] [Google Scholar]

Citations & impact 


Impact metrics

Jump to Citations

Citations of article over time

Smart citations by scite.ai
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by EuropePMC if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1016/j.cmpb.2004.12.004

Supporting
Mentioning
Contrasting
2
65
1

Article citations


Go to all (35) article citations