Abstract
The paper presents smooth estimation of densities utilizing penalized splines. The idea is to represent the unknown density by a convex mixture of basis densities, where the weights are estimated in a penalized form. The proposed method extends the work of Komárek and Lesaffre (Comput Stat Data Anal 52(7):3441–3458, 2008) and allows for general density estimation. Simulations show a convincing performance in comparison to existing density estimation routines. The idea is extended to allow the density to depend on some (factorial) covariate. Assuming a binary group indicator, for instance, we can test on equality of the densities in the groups. This provides a smooth alternative to the classical Kolmogorov-Smirnov test or an Analysis of Variance and it shows stable and powerful behaviour.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom 19(6): 716–723
Babu GJ, Canty AJ, Chaubey YP (2002) Application of bernstein polynomials for smooth estimation of a distribution and density function. J Stat Plan Infer 105(2): 377–392
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York, NY
Boneva LI, Kendall D, Stefanov I (1971) Spline transformations: three new diagnostic aids for the statistical data- analyst. J R Stat Soc Ser B 33(1): 1–71
Butterfield K (1976) The computation of all the derivatives of a b-spline basis. IMA J Appl Math 17(1): 15–25
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13: 195–212. doi:10.1007/BF01246098
Claeskens G, Krivobokova T, Opsomer J (2009) Asymptotic properties of penalized spline estimators. Biometrika 96(3): 529–544
de Boor C (1978) A practical guide to splines. Springer, Berlin
Dias R (1998) Density estimation via hybrid splines. J Stat Comput Simul 60(4): 277–293
Efron B, Tibshirani R (1996) Using specially designed exponential families for density estimation. Ann Stat 24(6): 2431–2461
Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2): 89–121
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611–631
Ghidey W, Lesaffre E, Eilers PHC (2004) Smooth random effects distribution in a linear mixed model. Biometrics 60(4): 945–953
Good IJ, Gaskins RA (1971) Nonparametric roughness penalties for probability densities. Biometrika 58(2): 255–277
Gu C (1993) Smoothing spline density estimation: A dimensionless automatic algorithm. J Am Stat Assoc 88(422): 495–504
Gu C (2009) gss: general smoothing splines. R package version 1.0-5
Gu C, Wang J (2003) Penalized likelihood density estimation: direct cross-validation and scalable approximation. Statistica Sinica 13(3): 811–826
Hall P, Patil P (1995) Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Ann Stat 23(3): 905–928
Kass RE, Steffey D (1989) Approximate bayesian inference in conditionally independent hierarchical models (parametric empirical bayes models). J Am Stat Assoc 84(407): 717–726
Kauermann G (2005) A note on smoothing parameter selection for penalised spline smoothing. J Stat Plan Infer 127(1–2): 53–69
Kauermann G, Krivobokova T, Fahrmeir L (2009) Some asymptotic results on generalized penalized spline smoothing. J R Stat Soc Ser B 71(2): 487–503
Kauermann G, Opsomer J (2011) Data-driven selection of the spline dimension in penalized spline regression. Biometrika 98(1): 225–230
Komárek A (2006) Accelerated failure time models for multivariate doubly-interval-censored data with flexible distributional assumptions. Ph.D. thesis, Leuven: Katholieke Universiteit Leuven, Faculteit Wetenschappen
Komárek A, Lesaffre E (2008) Generalized linear mixed model with a penalized gaussian mixture as a random-effects distribution. Comput Stat Data Anal 52(7): 3441–3458
Komárek A, Lesaffre E, Hilton J (2005) Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J Comput Graph Stat 14(3): 726–745
Koo JY, Kooperberg C, Park J (1999) Logspline density estimation under censoring and truncation. Scand J Stat 26(1): 87–105
Kooperberg C (2009) logspline: Logspline density estimation routines. R package version 2.1.3.
Li JQ, Barron AR (1999) Mixture density estimation. In: Advances in neural information processing systems 12. MIT Press, Cambridge, pp 279–285
Li Y, Ruppert D (2008) On the asymptotics of penalized splines. Biometrika 95(2): 415–436
Lindsey JK (1974) Comparison of probability distributions. J R Stat Soc Ser B 36(1): 38–47
Lindsey JK (1974) Construction and comparison of statistical models. J R Stat Soc Ser B 36(3): 418–425
Liu L, Levine M, Zhu Y (2009) A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization. J Comput Graph Stat 18(2): 481–504
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Müller P, Quintana F, Rosner G (2009) Bayesian Clustering with Regression. University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 U.S.A
Nadaraya E (1974) On the integral mean square error of some nonparametric estimates for the density function. Theory Prob Appl 19(1): 133–141
Nadaraya EA (1964) On estimating regression. Theory Prob Appl 9(1): 141–142
Nason G (2010) wavethresh: Wavelets statistics and transforms. R package version 4.5
Nason GP (2008) Wavelet methods in statistics with R. Springer, Berlin ISBN 978-0-387-75960-9
Nason GP, Silverman BW (1999) Wavelets for regression and other statistical problems. In: Schimek MG (ed) Smoothing and regression: approaches, computation, and application, series in probability and statistics. Wiley, New York
O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1(4): 502–518
Reiss T, Ogden R (2009) Smoothing parameter selection for a class of semiparametric linear models. J R Stat Soc Ser B 71(2): 505–523
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 71(2): 319–392
Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11(4): 735–757
Ruppert D, Wand M, Carroll R (2003) Semiparametric regression. Cambridge University Press, Cambridge
Ruppert D, Wand MP, Carroll RJ (2009) Semiparametric regression during 2003–2007. Electron J Stat 3: 1193–1256
Schall R (1991) Estimation in generalized linear models with random effects. Biometrika 78(4): 719–727
Schellhase C (2010) pendensity: density estimation with a penalized mixture approach. R package version 0.2.3
Searle S, Casella G, McCulloch C (1992) Variance components. Wiley, New York
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53(3): 683–690
Silverman BW (1982) On the estimation of a probability density function by the maximum penalized likelihood method. Ann Stat 10(3): 795–810
Simonoff JS (1996) Smoothing methods in statistics. Springer, New York
Wand M (2003) Smoothing and mixed models. Comput Stat 18(2): 223–249
Wand M, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Wand MP, Ormerod JT (2008) On semiparametric regression with O’Sullivan penalised splines. Aust N Z J Stat 50(2): 179–198
Watson G (1964) Smooth regression analysis. Sankhya Ser A 26: 359–372
Wood S (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Ser B 73(1): 3–36
Wood SN (2006) Generalized additive models. Chapman and Hall/CRC, London
Young D, Hunter D, Chauveau D, Benaglia T (2009) mixtools: an R package for analyzing mixture models. J Stat Softw 32(6): 1–29
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schellhase, C., Kauermann, G. Density estimation and comparison with a penalized mixture approach. Comput Stat 27, 757–777 (2012). https://doi.org/10.1007/s00180-011-0289-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0289-6