Abstract
In this work we propose a novel EM method for the estimation of nonlinear nonparametric mixed-effects models, aimed at unsupervised classification. We perform simulation studies in order to evaluate the algorithm performance and we apply this new procedure to a real dataset.




Similar content being viewed by others
References
Aitkin M (1996a) A general maximum likelihood analysis of overdispersion in generalized linear models. Stat Comput 6:251–262
Aitkin M (1999b) A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55:117–128
Antic J, Laffont CM, Chafaï D, Concordet D (2009) Comparison of nonparametric methods in nonlinear mixed effect models. Comput Stat Data Anal 53(3):642–656
Davidian M, Gallant AR (1993) The nonlinear mixed effects model with a smooth random effects density. Biometrika 80(3):475–488
De Lalla C, Rinaldi A, Montagna D, Azzimonti L, Bernardo ME, Sangalli LM, Paganoni AM, Maccario R, Zecca M, Locatelli F, Dellabona P, Casorati G (2011) Invariant natural killer T-cell reconstitution in pediatric leukemia patients given HLA-haploidentical stem cell transplantation defines distinct CD4+ and CD4- subset dynamics and correlates with remission state. J Immunol 186(7):4490–4499
Einbeck J, Darnell R, Hinde J (2009) npmlreg: Nonparametric maximum likelihood estimation for random effect models. [Online] http://CRAN.R-project.org/package=npmlreg
Fox J (2002) Linear mixed models, appendix to an R and S-PLUS companion to applied regression
Gallant AR (1987) Nonlinear statistical models. Wiley, New York
Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. Int Stat Rev 70(3):419–435
Goldstein H (1991) Nonlinear multilevel models, with an application to discrete response data. Biometrika 78(1):45–51
Hox JJ (1995) Applied multilevel analysis. TT-Publikaties, Amsterdam
Ieva F, Paganoni AM, Secchi P (2012) Mining administrative health databases for epidemiological purposes: a case study on acute myocardial infarctions diagnoses. In: Pesarin F, Torelli S (eds) Accepted for publication in advances in theoretical and applied statistics. Springer, Berlin. [Online] http://mox.polimi.it/it/progetti/pubblicazioni/quaderni/45-2010.pdf
Kuhn E, Lavielle M (2005) Maximum likelihood estimation in nonlinear mixed effect models. Comput Stat Data Anal 49(4):1020–1038
Lai TL, Shih MC (2003) Nonparametric estimation in nonlinear mixed-effects models. Biometrika 90(1): 1–13
Lindsay BG (1983a) The geometry of mixture likelihoods: a general theory. Ann Stat 11(1):86–94
Lindsay BG (1983b) The geometry of mixture likelihoods, part II: the exponential family. Ann Stat 11(3):783–792
Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-plus. Springer, New York
R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. [Online] http://www.R-project.org
Sheiner LB, Beal SL (1980) Evaluation of methods for estimating population pharmacokinetic parameters. III. Monoexponential model: routine clinical pharmacokinetic data. J Pharmacokinet Pharmacodyn 11(3):303–319
Schumitzky A (1991) Nonparametric EM algorithms for estimating prior distributions. Appl Math Comput 45(2):143–157
Vermunt JK (2004) An EM algorithm for the estimation of parametric and non-parametric hierarchical nonlinear models. Statistica Neerlandica 58(2):220–233
Walker S (1996) An EM algorithm for nonlinear random effects models. Biometrics 52(3):934–944
Wolfinger R (1993) Laplace’s approximation for nonlinear mixed models. Biometrika 80(4):791–795
Acknowledgments
The case study in Sect. 4 is within the Strategic Program “Exploitation, integration and study of current and future health databases in Lombardia for Acute Myocardial Infarction” supported by “Ministero del Lavoro, della Salute e delle Politiche Sociali” and by “Direzione Generale Sanità - Regione Lombardia”.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of increasing likelihood property of the EM algorithm
In Sect. 2.2 we propose an EM algorithm for the estimation of the parameters of model (1). The update of the parameter described by Eqs. (3) and (4) provides an increasing of the likelihood function (2), that is
where \(\beta ^{(up)}\) and \(\sigma ^{2(up)}\) are the updated fixed effects and error variance. The likelihood \(L(\varvec{\beta }^{(up)}, \sigma ^{2(up)}| \mathbf y )\) is computed summing up the random effects with respect to the updated discrete distribution \((\mathbf c _l^{(up)},\omega _l^{(up)})\) for \(l=1,\ldots ,M\).
Thanks to the definition of likelihood function (2) we have that
All these terms can be bounded above by the quantity
where
\(Q_i(\theta ,\theta )\) is analogously defined, and \(\theta =(\varvec{\beta }, \mathbf c _1,\ldots ,\mathbf c _M,\omega _1,\ldots , \omega _M, \sigma ^2)\).
This bound can be found thanks to the convexity of the logarithm since
Defining
we obtain, thanks to Eq. (9), an upper bound for the quantity of interest
We have now to show that \(\forall \theta \)
In order to prove this result we can show that, \(\forall \theta \) fixed, \(\theta ^{(up)}\) is defined as
Defining \(W_{il}\) as in Eq. (5) we obtain
The functionals \(J_1\) and \(J_2\) can be maximized separately. The update (3) for the weights of the random effects distribution \(\omega _1,\ldots ,\omega _M\) is obtained in closed form maximizing the functional \(J_1\).
The functional \(J_1\) can be written as
Imposing the gradient of the functional \(J_1\) equal to zero we obtain
that is equivalent to
Since \(\sum _{l=1}^M W_{il}=1\), we obtain \(\omega _l^{(up)}=\sum _{i=1}^N W_{il}/N\).
On the other hand the update (4) for the fixed effects \(\varvec{\beta }\), the error variance \(\sigma ^2\) and the support points of the random effects distribution \(\mathbf c _1,\ldots ,\mathbf c _M\) is obtained maximizing the functional \(J_2\) in an iterative way, described in Sect. 2.2 and in “Appendix B”.
Appendix B: Details on NLNPEM Algorithm
The NLNPEM is the following:
-
1.
Define a starting discrete distribution for random effects with support on \(M=N\) points \((\mathbf c _l^{(0)},\omega _l^{(0)})\) for \(l=1,\ldots ,M\), a starting estimate for the fixed effects \(\varvec{\beta }^{(0)}\) and for \(\sigma ^{2(0)}\) and the tolerance parameters \(D\) and \({\tilde{\omega }}\);
-
2.
given \((\mathbf c _l^{(k-1)}, \omega _l^{(k-1)})\) for \(l=1,\ldots ,M\), \(\varvec{\beta }^{(k-1)}\) and \(\sigma ^{2(k-1)}\), update the weights \(\omega _1^{(k)},\ldots , \omega _M^{(k)}\) of the random effect distribution, according to Eq. (3);
-
3.
-
(a)
initialize \(\mathbf c _l^{(k,0)}=\mathbf c _l^{(k-1)}\) for \(l=1,\ldots ,M, \varvec{\beta }^{(k,0)}=\varvec{\beta }^{(k-1)}\) and \(\sigma ^{2(k,0)}=\sigma ^{2(k-1)}\);
-
(b)
given \((\mathbf c _l^{(k,j-1)},\omega _l^{(k-1)})\) for \(l=1,\ldots ,M, \varvec{\beta }^{(k,j-1)}\) and \(\sigma ^{2(k,j-1)}\) update the \(M\) support points \(\mathbf c _1^{(k,j)},\ldots ,\mathbf c _M^{(k,j)}\) according to Eq. (6);
-
(c)
given \((\mathbf c _l^{(k,j)},\omega _l^{(k-1)})\) for \(l=1,\ldots ,M, \varvec{\beta }^{(k,j-1)}\) and \(\sigma ^{2(k,j-1)}\) maximize Eq. (4) with respect to \(\varvec{\beta }\) and \(\sigma ^2\) obtaining \(\varvec{\beta }^{(k,j)}\) and \(\sigma ^{2(k,j)}\);
-
(d)
iterate steps 3(b) and 3(c) until convergence and set \(\mathbf c _l^{(k)}=\mathbf c _l^{(k,j)}\) for \(l=1,\ldots ,M, \varvec{\beta }^{(k)}=\varvec{\beta }^{(k,j)}\) and \(\sigma ^{2(k)}=\sigma ^{2(k,j)}\);
-
(a)
-
4.
iterate steps 2 and 3 until convergence;
-
5.
reduce the support of the discrete distribution, according with the tuning parameters \(D\) and \({\tilde{\omega }}\);
-
6.
iterate steps 2, 3 and 5 until convergence.
The algorithm reaches convergence when parameters and discrete distribution stop changing or when there is no variation in the log-likelihood function.
Appendix C: Details on simulation study
1.1 Appendix C.1: The linear case
We simulated 8 datasets of linear curves grouped in a number of clusters that vary form 2 to 10. Different values of the error variance \(\sigma ^2\) have been chosen for each test set, in order to obtain noisy observations for each curve. Some examples of simulated data are shown in left panels of Fig. 1. Datasets addressed with the name “S” contain groups in which only slopes is random, “I” datasets contain groups where only intercept is random and “SI” datasets contain curves where both slope and intercept are random. The simulated datasets are then:
-
lin2S: 2 balanced groups, each one composed by 25 curves, with the same intercept (equal to \(4\)), 2 different slopes (\(\mathbf c = (c_1,c_2) = (1,2)\)) and \(\sigma = 1\);
-
lin2I: 2 balanced groups, each one composed by 25 curves with the same slope (equal to \(1\)), 2 different intercept (\(\mathbf c = (c_1,c_2) = (3,10)\)) and \(\sigma = 0.65\);
-
lin4SI: 4 balanced groups, each one composed by 25 curves, where location points \(\mathbf c = (\mathbf c _1,\mathbf c _2,\mathbf c _3,\mathbf c _4)\) are obtained from all possible combinations of 2 different slopes (equal to \(1\) and \(3\)) and 2 different intercepts (equal to \(40\) and \(60\)), i.e., \(\mathbf c _1 = (1,40)\), \(\mathbf c _2 = (1,60)\), \(\mathbf c _3 = (3,40)\) and \(\mathbf c _4 = (3,60)\) with \(\sigma = 1\);
-
lin3S: 3 unbalanced groups, composed by 24, 24 and 2 curves respectively, with the same intercept (equal to \(4\)), 3 different slopes (\( \mathbf c = (c_1,c_2,c_3) = (1,2,3.5)\)) and \(\sigma = 1\);
-
lin3I: 3 unbalanced groups, composed by 24, 24 and 2 curves respectively, with the same slope (equal to \(1\)), 3 different intercepts (\(\mathbf c = (c_1,c_2,c_3) = (2,7,14)\)) and \(\sigma = 1\);
-
lin9SI: 9 unbalanced groups, 6 of whom containing 24 curves and 3 containing 2 curves, where location points \(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8,c_9)\) are obtained from all possible combinations of 3 different slopes (equal to \(1\), \(4\) and \(7\)) and 3 different intercept (equal to \(20\), \(35\) and \(60\)) with \(\sigma = 1.5\);
-
lin10S: 10 balanced groups, each one composed by 50 curves with the same intercept (equal to \(1\)), 10 different slopes (\(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8,c_9,c_{10}) = (0.5,2,4,5.5,7.5,\) \(10,12,13.5,16,20)\)) and \(\sigma = 1.5\);
-
lin10I: 10 balanced groups, each one composed by 15 curves with the same slope (equal to \(1\)), 10 different intercepts (\(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8,c_9,c_{10}) = (1,5,10,15,20,25,30,\) \(35,40,45)\)) and \(\sigma = 1\).
1.2 Appendix C.2: The exponential case
We simulated 3 datasets of exponential growth curves where only asymptote varies and is considered as random. All datasets are then addressed with the name “A”. They are:
-
exp2A: 2 balanced groups, each one composed by 25 curves, with the same growth rate (\(\lambda =0.5\)), 2 different asymptotes (\(\mathbf c = (c_1,c_2) = (1,1.5)\)) and \(\sigma =0.04\);
-
exp3A: 3 unbalanced groups of 24, 24 and 2 curves respectively, with the same growth rate (\(\lambda =0.5\)), 3 different asymptotes (\( \mathbf c = (c_1,c_2,c_3) = (1,1.5,2.3)\)) and \(\sigma =0.04\);
-
exp10A: 10 balanced groups, each one composed by 5 curves, with the same growth rate (\(\lambda =0.5\)), 10 different asymptotes (\(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8,c_9,c_{10})\) \(=(1,1.25,1.5,\) \(1.75,2,2.25,2.5,2.75,3,3.25))\) and \(\sigma =0.04\).
1.3 Appendix C.3: The logistic case
We simulated 8 datasets of logistic growth curves. Datasets addressed with the name “A” represent random asymptote cases, “I” datasets contain groups where only inflection point is random and “AI” ones contain curves where both asymptote and inflection point are random. We then have:
-
logis2A: 2 balanced groups, each one composed by 25 curves, with \(\delta =6\), \(\gamma =1\), 2 different asymptotes (\(\mathbf c = (c_1,c_2) = (1,2)\)) and \(\sigma =0.04\);
-
logis2I: 2 balanced groups, each one composed by 25 curves, with \(\alpha =1\), \(\gamma =1\), 2 different inflection points (\(\mathbf c = (c_1,c_2) = (6,8)\)) and \(\sigma =0.04\);
-
logis4AI: 4 balanced groups, each one composed by 25 curves, where location points \(\mathbf c = (\mathbf c _1,\mathbf c _2,\mathbf c _3,\mathbf c _4)\) are obtained from all possible combinations of 2 different asymptotes (equal to \(1\) and \(2\)) and 2 different inflection points (equal to \(6\) and \(10\)), i.e., \(\mathbf c _1 = (1,6)\), \(\mathbf c _1 = (1,10)\), \(\mathbf c _1 = (2,6)\) and \(\mathbf c _4 = (2,10)\) with \(\gamma =1\) and \(\sigma =0.04\);
-
logis3A: 3 unbalanced groups of 24, 24 and 2 curves respectively, with \(\delta =6\), \(\gamma =1\), 3 different asymptotes (\( \mathbf c = (c_1,c_2,c_3) = (1,2,3.5)\)) and \(\sigma =0.04\);
-
logis3I: 3 unbalanced groups of 24, 24 and 2 curves respectively, with \(\alpha =1\), \(\gamma =1\), 3 different inflection points (\( \mathbf c = (c_1,c_2,c_3) = (6,8,11.5)\)) and \(\sigma =0.04\);
-
logis9AI: 9 unbalanced groups of curves (6 of whom containing 24 curves and 3 containing 2 curves), where location points \(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7, c_8,c_9)\) are obtained from all possible combinations of 3 different asymptotes (equal to \(1\), \(2\) and \(4\)) and 3 different inflection points (equal to \(6\), \(8\) and \(11.5\)) with \(\gamma =1\) and \(\sigma =0.04\);
-
logis10A: 10 balanced groups, each one composed by 5 curves, with \(\delta =6\), \(\gamma =1\), 10 different asymptotes (\(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7, c_8,c_9,c_{10}) = (1,1.25,1.5,1.75,2,2.25,2.5, 2.75,3,3.25))\) and \(\sigma =0.04\);
-
logis10I: 10 balanced groups, each one composed by 5 curves, with \(\alpha =1\), \(\gamma =1\), 10 different inflection points (\(\mathbf c = (c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8,c_9,c_{10}) = (4.5,5.5,7,8,9.5,10.5,12, 13,14.5,16))\) and \(\sigma =0.04\).
Appendix D: Comparison of results
Comparison of estimates carried out by npmlreg and NLNPEM method are reported here, for some cases of interest mentioned in the paper (Tables 4, 5, 6).
-
Linear case—Random-intercept case (lin2I)
-
Linear case—Random-slope case (lin3S)
-
Linear case—Random-intercept case (lin10I)
Rights and permissions
About this article
Cite this article
Azzimonti, L., Ieva, F. & Maria Paganoni, A. Nonlinear nonparametric mixed-effects models for unsupervised classification. Comput Stat 28, 1549–1570 (2013). https://doi.org/10.1007/s00180-012-0366-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0366-5