Partitioned algorithms for maximum likelihood and other non-linear estimation

Smyth, Gordon K.

doi:10.1007/BF00140865

Partitioned algorithms for maximum likelihood and other non-linear estimation

Papers
Published: September 1996

Volume 6, pages 201–216, (1996)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Gordon K. Smyth¹

427 Accesses
43 Citations
6 Altmetric
Explore all metrics

Abstract

There are a variety of methods in the literature which seek to make iterative estimation algorithms more manageable by breaking the iterations into a greater number of simpler or faster steps. Those algorithms which deal at each step with a proper subset of the parameters are called in this paper partitioned algorithms. Partitioned algorithms in effect replace the original estimation problem with a series of problems of lower dimension. The purpose of the paper is to characterize some of the circumstances under which this process of dimension reduction leads to significant benefits.

Four types of partitioned algorithms are distinguished: reduced objective function methods, nested (partial Gauss-Seidel) iterations, zigzag (full Gauss-Seidel) iterations, and leapfrog (non-simultaneous) iterations. Emphasis is given to Newton-type methods using analytic derivatives, but a nested EM algorithm is also given. Nested Newton methods are shown to be equivalent to applying to same Newton method to the reduced objective function, and are applied to separable regression and generalized linear models. Nesting is shown generally to improve the convergence of Newton-type methods, both by improving the quadratic approximation to the log-likelihood and by improving the accuracy with which the observed information matrix can be approximated. Nesting is recommended whenever a subset of parameters is relatively easily estimated. The zigzag method is shown to produce a stable but generally slow iteration; it is fast and recommended when the parameter subsets have approximately uncorrelated estimates. The leapfrog iteration has less guaranteed properties in general, but is similar to nesting and zigzagging when the parameter subsets are orthogonal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of Linear and Nonlinear Models

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Article 06 May 2022

A Versatile Model for Clustered and Highly Correlated Multivariate Data

Article Open access 03 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aitkin, M. (1987) Modelling variance heterogeneity in normal regression using GLIM. Appl. Statist., 36, 332–9.
Google Scholar
Amari, S. (1982) Differential geometry of curved exponential families—curvatures and information loss. Ann. Statist., 10, 357–85.
Google Scholar
Amari, S. (1985) Differential geometrical methods in statistics. Lecture Notes in Statistics 28, Springer-Verlag, Heidelberg.
Google Scholar
Barham, R. H. and Drane, W. (1972) An algorithm for least estimation of non-linear parameters when some of the parameters are linear. Technometrics, 14, 757–66.
Google Scholar
Barndorff-Nielsen, O. E. (1977) Exponentially decreasing distributions for the logarithm of particle size. J. Roy. Statist. Soc. A, 353, 401–9.
Google Scholar
Bates, D. M. and Lindstrom, M. J. (1986) Nonlinear least squares with conditionally linear parameters. In: Proceedings Statistical Computing Section, American Statistical Association, New York.
Google Scholar
Bates, D. M. and Watts, D. G. (1980) Relative curvature measures of nonlinearity. J. R. Statist. Soc. B, 42, 1–25.
Google Scholar
Bates, D. M. and Watts, D. G. (1988) Nonlinear Regression Analysis and its Applications. Wiley, New York.
Google Scholar
Box, G. E. P. and Cox, D. R. (1964) An analysis of transformation (with discussion). J. R. Statistic. Soc. B, 26, 211–52.
Google Scholar
Chambers, J. M. and Hastie, T. J. (ed.) (1992) Statistical Modelsin S. Wadsworth & Brooks/Cole, Pacific Grove, CA.
Google Scholar
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. B, 39, 1–37.
Google Scholar
Dieudonné, J. A. E. (1960) Foundations of Modern Analysis. Academic Press, New York.
Google Scholar
Efron, B. (1975) Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist., 3, 1189–242.
Google Scholar
Fieller, N. R. J., Flenley, E. C. and Olbricht, W. (1992) Statistics of particle size data. Applied Statist., 41, 127–46.
Google Scholar
Gallant, A. R. (1987) Nonlinear Statistical Models. Wiley, New York.
Google Scholar
Golub, G. H. and Pereyra, V. (1973) The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal., 10, 413–32.
Google Scholar
Golub, G. H. and Pereyra, V. (1976) The differentiation of pseudo-inverses, separable nonlinear least square problems and other tales. In: Generalized Inverses and Applications, pp. 303–24. Academic Press, New York.
Google Scholar
Golub, G. H. and van Loan, C. F. (1983) Matrix Computations. Johns Hopkins University Press, Baltimore, MD.
Google Scholar
Hartley, H. O., (1948) The estimation of non-linear parameters by ‘internal least squares’. Biometrika, 35, 32–45.
Google Scholar
Harville, D. A. (1973) Fitting partially linear models by weighted least squares. Technometrics, 15, 509–15.
Google Scholar
Jennrich, R. I. (1969) Asymptotic properties of non-linear least squares estimators. Ann. Math. Statist., 40, 633–43.
Google Scholar
Jensen, J. (1988) Maximum likelihood estimation of hyperbolic parameters from grouped observations. Comput. Geosci., 14, 380–408.
Google Scholar
Jensen, S. T., Johansen, S. and Lauritzen, S. L. (1991) Globally convergent algorithms for maximizing a likelihood function. Biometrika, 78, 867–77.
Google Scholar
Jørgensen, B. (1984) The delta algorithm and GLIM. Int. Statist. Rev., 52, 282–300.
Google Scholar
Jørgensen, B. (1987) Exponential dispersion models. J. R. Statist. Soc. B, 49, 127–62.
Google Scholar
Kass, R. E. and Slate, E. H. (1992) Reparametrization and diagnostics of posterior non-normality. In Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting (J. O. Berger, J. M. Bernardo, A. P. Dawid, D.V. Lindley and A. F. M. Smith, eds.) 289–305. Oxford University Press.
Kaufmann, L. (1975) A variable projection method for solving separable nonlinear least squares problems. BIT, 15, 49–57.
Google Scholar
Khuri, A. I. (1984) A note on D-optimal designs for partially nonlinear regression models. Technometrics, 26, 59–61.
Google Scholar
Kowalik, J. and Osborne, M. R. (1968) Methods for Unconstrained Optimization Problems. American Elsevier, New York.
Google Scholar
Lawton, W. H. and Sylvestre, E. A. (1971) Elimination of linear parameters in nonlinear regression. Technometrics, 13, 461–7.
Google Scholar
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. Chapman and Hall, London.
Google Scholar
Ortega, J. M. and Rheinboldt, W. C. (1970) Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York.
Google Scholar
Osborne, M. R. (1972) Some aspects of nonlinear least squares calculations. In: Numerical Methods for Nonlinear Optimization, Lootsma, F. (ed.), Academic Press, New York.
Google Scholar
Osborne, M. R. (1987) Estimating nonlinear models by maximum likelihood for the exponential family. SIAM J. Sci. Statist. Comp., 8, 446–56.
Google Scholar
Osborne, M. R. (1992) Fisher's method of scoring. Int. Statist. Rev., 60, 99–117.
Google Scholar
Osborne, M. R. and Smyth, G. K. (1991) A modified Prony algorithm for fitting functions defined by difference equations. SIAM J. Sci. Statist. Comp., 12, 362–82.
Google Scholar
Ostrowski, A. M. (1960) Solutions of Equations and Systems of Equations. Academic Press, New York.
Google Scholar
Pimentel-Gomes, F. (1953) The use of Mitscherlich's regression law in the analysis of experiments with fertilizers. Biometrics, 9, 498–516.
Google Scholar
Pregibon, D. (1980) Goodness of link tests for generalised linear models. Appl. Statist., 29, 15–24.
Google Scholar
Rao, C. R. (1973) Linear Statistical Inference and its Applications. Wiley, New York.
Google Scholar
Ratkowsky, D. A. (1983) Nonlinear Regression Modelling, A Unified Practical Approach, Dekker, New York.
Google Scholar
Ratkowsky, D. A. (1989) Handbook of Nonlinear Regression Models. Dekker, New York.
Google Scholar
Richards, F. S. G. (1961) A method of maximum-likelihood estimation. J. R. Statist. Soc. B, 23, 469–75.
Google Scholar
Ross, G. J. S. (1970) The efficient use of function minimization in non-linear maximum-likelihood estimation. Appl. Statist., 19, 205–21.
Google Scholar
Ross, G. J. S. (1990) Nonlinear Estimation. Springer-Verlag, New York.
Google Scholar
Ruhe, A. and Wedin, P. A. (1980) Algorithms for separable nonlinear least squares problems. SIAM Rev., 22, 318–37.
Google Scholar
Scallan, A. (1982) Some aspects of parametric link functions. In: GLIM 82, R. Gilchrist (ed.), New York: Springer-Verlag.
Google Scholar
Scallan, A., Gilchrist, R. and Green, M. (1984) Fitting parametric link functions in generalised linear models. Comput. Statist. Data Anal., 2, 37–49.
Google Scholar
Schall, R. (1991) Estimation in generalized linear models with random effects. Biometrika, 78, 719–27.
Google Scholar
Seber, G. A. F. and Wild, C. J. (1989) Nonlinear Regression. Wiley, New York.
Google Scholar
Smyth, G. K. (1987) Curvature and convergence. 1987 Proceedings of the Statistical Computing Section. American Statistical Association, Virginia, pp. 278–83.
Google Scholar
Smyth, G. K. (1989) Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60.
Google Scholar
Smyth, G. K. (1992) Using Poisson-gamma generalized linear models to model data with exact zeros. Technical Report, Department of Mathematics, University of Queensland.
Sprott, D. A. (1973) Normal likelihoods and their relation to large sample theory estimation. Biometrika, 60, 457–65.
Google Scholar
Stevens, W. L. (1951) Asymptotic regression. Biometrics, 7, 247–67.
Google Scholar
Thisted, R. (1988) Elements of Statistical Computing. Chapman & Hall, New York.
Google Scholar
Tweedie, M. C. K. (1984) An index which distinguishes between some important exponential families. In: Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (eds. J. K. Ghosh and J. Roy), pp. 579–604. Indian Statistical Institute, Calcutta.
Google Scholar
Varah, J. M. (1990) Relative sizes of the Hessian terms in nonlinear parameter estimation. SIAM J. Sci. Stat. Comput., 11, 174–9.
Google Scholar
Verbyla, A. P. (1993) Modelling variance heterogeneity: residual maximum likelihood and diagnostics. J. Roy. Statist. Soc. B, 55, 493–508.
Google Scholar
Walling, D. (1968) Non-linear least squares curve fitting when some parameters are linear. Texas J. Science, 20, 119–24.
Google Scholar
Weisberg, S. and Welsh, A. H. (1994) Adapting for the missing link. Ann. Statist., 22, 1674–1700.
Google Scholar
Wermuth, N. and Scheidt, E. (1977) Algorithm AS105: Fitting a covariance selection model to a matrix. Appl. Statist., 26, 88–92.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Queensland, Q 4072, Brisbane, Australia
Gordon K. Smyth

Authors

Gordon K. Smyth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smyth, G.K. Partitioned algorithms for maximum likelihood and other non-linear estimation. Stat Comput 6, 201–216 (1996). https://doi.org/10.1007/BF00140865

Download citation

Issue Date: September 1996
DOI: https://doi.org/10.1007/BF00140865

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partitioned algorithms for maximum likelihood and other non-linear estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Applications of Linear and Nonlinear Models

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

A Versatile Model for Clustered and Highly Correlated Multivariate Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Partitioned algorithms for maximum likelihood and other non-linear estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Applications of Linear and Nonlinear Models

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

A Versatile Model for Clustered and Highly Correlated Multivariate Data

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation