Skip to main content
Log in

Density estimation and comparison with a penalized mixture approach

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The paper presents smooth estimation of densities utilizing penalized splines. The idea is to represent the unknown density by a convex mixture of basis densities, where the weights are estimated in a penalized form. The proposed method extends the work of Komárek and Lesaffre (Comput Stat Data Anal 52(7):3441–3458, 2008) and allows for general density estimation. Simulations show a convincing performance in comparison to existing density estimation routines. The idea is extended to allow the density to depend on some (factorial) covariate. Assuming a binary group indicator, for instance, we can test on equality of the densities in the groups. This provides a smooth alternative to the classical Kolmogorov-Smirnov test or an Analysis of Variance and it shows stable and powerful behaviour.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom 19(6): 716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Babu GJ, Canty AJ, Chaubey YP (2002) Application of bernstein polynomials for smooth estimation of a distribution and density function. J Stat Plan Infer 105(2): 377–392

    Article  MathSciNet  MATH  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York, NY

    MATH  Google Scholar 

  • Boneva LI, Kendall D, Stefanov I (1971) Spline transformations: three new diagnostic aids for the statistical data- analyst. J R Stat Soc Ser B 33(1): 1–71

    MathSciNet  MATH  Google Scholar 

  • Butterfield K (1976) The computation of all the derivatives of a b-spline basis. IMA J Appl Math 17(1): 15–25

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13: 195–212. doi:10.1007/BF01246098

    Article  MathSciNet  MATH  Google Scholar 

  • Claeskens G, Krivobokova T, Opsomer J (2009) Asymptotic properties of penalized spline estimators. Biometrika 96(3): 529–544

    Article  MathSciNet  MATH  Google Scholar 

  • de Boor C (1978) A practical guide to splines. Springer, Berlin

    Book  MATH  Google Scholar 

  • Dias R (1998) Density estimation via hybrid splines. J Stat Comput Simul 60(4): 277–293

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani R (1996) Using specially designed exponential families for density estimation. Ann Stat 24(6): 2431–2461

    Article  MathSciNet  MATH  Google Scholar 

  • Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2): 89–121

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Ghidey W, Lesaffre E, Eilers PHC (2004) Smooth random effects distribution in a linear mixed model. Biometrics 60(4): 945–953

    Article  MathSciNet  MATH  Google Scholar 

  • Good IJ, Gaskins RA (1971) Nonparametric roughness penalties for probability densities. Biometrika 58(2): 255–277

    Article  MathSciNet  MATH  Google Scholar 

  • Gu C (1993) Smoothing spline density estimation: A dimensionless automatic algorithm. J Am Stat Assoc 88(422): 495–504

    Article  MATH  Google Scholar 

  • Gu C (2009) gss: general smoothing splines. R package version 1.0-5

  • Gu C, Wang J (2003) Penalized likelihood density estimation: direct cross-validation and scalable approximation. Statistica Sinica 13(3): 811–826

    MathSciNet  MATH  Google Scholar 

  • Hall P, Patil P (1995) Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Ann Stat 23(3): 905–928

    Article  MathSciNet  MATH  Google Scholar 

  • Kass RE, Steffey D (1989) Approximate bayesian inference in conditionally independent hierarchical models (parametric empirical bayes models). J Am Stat Assoc 84(407): 717–726

    Article  MathSciNet  Google Scholar 

  • Kauermann G (2005) A note on smoothing parameter selection for penalised spline smoothing. J Stat Plan Infer 127(1–2): 53–69

    Article  MathSciNet  MATH  Google Scholar 

  • Kauermann G, Krivobokova T, Fahrmeir L (2009) Some asymptotic results on generalized penalized spline smoothing. J R Stat Soc Ser B 71(2): 487–503

    Article  MathSciNet  MATH  Google Scholar 

  • Kauermann G, Opsomer J (2011) Data-driven selection of the spline dimension in penalized spline regression. Biometrika 98(1): 225–230

    Article  MathSciNet  MATH  Google Scholar 

  • Komárek A (2006) Accelerated failure time models for multivariate doubly-interval-censored data with flexible distributional assumptions. Ph.D. thesis, Leuven: Katholieke Universiteit Leuven, Faculteit Wetenschappen

  • Komárek A, Lesaffre E (2008) Generalized linear mixed model with a penalized gaussian mixture as a random-effects distribution. Comput Stat Data Anal 52(7): 3441–3458

    Article  MATH  Google Scholar 

  • Komárek A, Lesaffre E, Hilton J (2005) Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J Comput Graph Stat 14(3): 726–745

    Article  Google Scholar 

  • Koo JY, Kooperberg C, Park J (1999) Logspline density estimation under censoring and truncation. Scand J Stat 26(1): 87–105

    Article  MathSciNet  MATH  Google Scholar 

  • Kooperberg C (2009) logspline: Logspline density estimation routines. R package version 2.1.3.

  • Li JQ, Barron AR (1999) Mixture density estimation. In: Advances in neural information processing systems 12. MIT Press, Cambridge, pp 279–285

    Google Scholar 

  • Li Y, Ruppert D (2008) On the asymptotics of penalized splines. Biometrika 95(2): 415–436

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsey JK (1974) Comparison of probability distributions. J R Stat Soc Ser B 36(1): 38–47

    MathSciNet  MATH  Google Scholar 

  • Lindsey JK (1974) Construction and comparison of statistical models. J R Stat Soc Ser B 36(3): 418–425

    MathSciNet  MATH  Google Scholar 

  • Liu L, Levine M, Zhu Y (2009) A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization. J Comput Graph Stat 18(2): 481–504

    Article  MathSciNet  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Müller P, Quintana F, Rosner G (2009) Bayesian Clustering with Regression. University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 U.S.A

  • Nadaraya E (1974) On the integral mean square error of some nonparametric estimates for the density function. Theory Prob Appl 19(1): 133–141

    Article  MATH  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Prob Appl 9(1): 141–142

    Article  Google Scholar 

  • Nason G (2010) wavethresh: Wavelets statistics and transforms. R package version 4.5

  • Nason GP (2008) Wavelet methods in statistics with R. Springer, Berlin ISBN 978-0-387-75960-9

    Book  MATH  Google Scholar 

  • Nason GP, Silverman BW (1999) Wavelets for regression and other statistical problems. In: Schimek MG (ed) Smoothing and regression: approaches, computation, and application, series in probability and statistics. Wiley, New York

    Google Scholar 

  • O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1(4): 502–518

    Article  MathSciNet  MATH  Google Scholar 

  • Reiss T, Ogden R (2009) Smoothing parameter selection for a class of semiparametric linear models. J R Stat Soc Ser B 71(2): 505–523

    Article  MathSciNet  MATH  Google Scholar 

  • Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 71(2): 319–392

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11(4): 735–757

    Article  MathSciNet  Google Scholar 

  • Ruppert D, Wand M, Carroll R (2003) Semiparametric regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2009) Semiparametric regression during 2003–2007. Electron J Stat 3: 1193–1256

    Article  MathSciNet  Google Scholar 

  • Schall R (1991) Estimation in generalized linear models with random effects. Biometrika 78(4): 719–727

    Article  MathSciNet  MATH  Google Scholar 

  • Schellhase C (2010) pendensity: density estimation with a penalized mixture approach. R package version 0.2.3

  • Searle S, Casella G, McCulloch C (1992) Variance components. Wiley, New York

    Book  MATH  Google Scholar 

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53(3): 683–690

    MathSciNet  MATH  Google Scholar 

  • Silverman BW (1982) On the estimation of a probability density function by the maximum penalized likelihood method. Ann Stat 10(3): 795–810

    Article  MATH  Google Scholar 

  • Simonoff JS (1996) Smoothing methods in statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Wand M (2003) Smoothing and mixed models. Comput Stat 18(2): 223–249

    MATH  Google Scholar 

  • Wand M, Jones MC (1995) Kernel smoothing. Chapman and Hall, London

    MATH  Google Scholar 

  • Wand MP, Ormerod JT (2008) On semiparametric regression with O’Sullivan penalised splines. Aust N Z J Stat 50(2): 179–198

    Article  MathSciNet  MATH  Google Scholar 

  • Watson G (1964) Smooth regression analysis. Sankhya Ser A 26: 359–372

    MathSciNet  MATH  Google Scholar 

  • Wood S (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Ser B 73(1): 3–36

    Article  MathSciNet  Google Scholar 

  • Wood SN (2006) Generalized additive models. Chapman and Hall/CRC, London

    MATH  Google Scholar 

  • Young D, Hunter D, Chauveau D, Benaglia T (2009) mixtools: an R package for analyzing mixture models. J Stat Softw 32(6): 1–29

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Göran Kauermann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schellhase, C., Kauermann, G. Density estimation and comparison with a penalized mixture approach. Comput Stat 27, 757–777 (2012). https://doi.org/10.1007/s00180-011-0289-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0289-6

Keywords

Navigation