Abstract
We consider an alternative or expanded data environment where we have sample data from a set of densities that are thought to be similar (as measured by Kullback–Leibler divergence). While estimation methods could easily be applied to each individual sample separately, the purpose of this manuscript is to develop an estimator that: (1) offers greater efficiency if in fact the set of densities are similar while seemingly not losing any if the set of densities are dissimilar; (2) does not require knowledge about the form or extent of similarities between the densities; (3) can be used with parametric or nonparametric methods; (4) allows for correlated data; and (5) is relatively easy to implement. Simulations indicate finite sample performance—in particular small sample performance—is quite promising. Interestingly, in the case where both similar and dissimilar densities are in the set of possible densities, the proposed estimator appropriately puts weight on the similar and not on the dissimilar densities. We apply the proposed estimator to recover a set of county crop yield densities and their corresponding crop insurance premium rates.
Similar content being viewed by others
Notes
We note that KL divergence is not a true distance measure like Hellinger distance because it does not satisfy the triangle equality criterion.
Kullback–Leibler divergence between density f(x) and g(x) is defined as \(KL(f(x),g(x)) = \int log (\frac{g(x)}{f(x)})g(x)dx\).
All simulation results using mean integrated squared error are available from the authors.
The densities are: (1) standard normal; (2) skewed unimodal; (3) strongly skewed unimodal; (4) kurtotic unimodal; (5) outlier; (6) bimodal; (7) separated bimodal; (8) asymmetric bimodal; (9) trimodal. The remaining densities are very perverse shapes that would not represent yield densities.
References
Battese GE, Harter RM, Fuller WA (1988) An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 83:28–36
Chung Y, Dunson DB (2012) Nonparametric Bayes conditional distribution modeling with variable selection. J Am Stat Assoc 104:1646–1660
Congressional Budget Office (2014) H.R. 2642, Agricultural Act of 2014: Cost Estimate. Congressional Budget Office, Washington
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B (Methodol) 56:363–375
Draper D (1995) Assessment and propagation of model uncertainty. J R Stat Soc Ser B (Methodol) 57(1):45–97
Dunson DB (2010) Nonparametric Bayes applications to biostatistics. Bayesian nonparametr 28:223
Elbers C, Lanjouw JO, Lanjouw P (2003) Micro-level estimation of poverty and inequality. Econometrica 71:355–364
Fay RE III, Herriot RA (1979) Estimates of income for small places: an application of James–Stein procedures to census data. J Am Stat Assoc 74:269–277
Ghosh M, Rao J (1994) Small area estimation: an appraisal. Stat Sci 9:55–76
Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28:355–375
Harri A, Coble K, Ker AP, Goodwin BJ (2011) Relaxing heteroscedasticity assumptions in area-yield crop insurance rating. Am J Agric Econ 93(3):707–717
Hjort NL, Glad IK (1995) Nonparametric density estimation with a parametric start. Ann Stat 23(3):882–904
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401
Jones M, Linton O, Nielsen J (1995) A simple bias reduction method for density estimation. Biometrika 82(2):327–338
Jones M, Signorini D (1997) A comparison of higher-order bias kernel density estimators. J Am Stat Assoc 92(439):1063–1073
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Ker AP (2016) Nonparametric estimation of possibly similar densities. Stat Probab Lett 117:23–30
Leamer EE (1978) Specification searches: Ad hoc inference with nonexperimental data, vol 53. Wiley, Hoboken
Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546
Marker DA (1999) Organization of small area estimators using a generalized linear regression framework. J Off Stat 15:1
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20(2):712–736
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142
Ormoneit D, Tresp V (1998) Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates. IEEE Trans Neural Netw 9:639–650
Pfeffermann D (2002) Small area estimation: new developments and directions. Int Stat Rev 70:125–143
Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266
Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191
Rao JNK (2003) Small area estimation. Wiley, New York
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B (Methodol) 59:731–792
Roberts HV (1965) Probabilistic prediction. J Am Stat Assoc 60:50–62
Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normals. J Am Stat Assoc 92:894–902
Volinsky CT, Madigan D, Raftery AE, Kronmal RA (1997) Bayesian model averaging in proportional hazard models: assessing the risk of a stroke. J R Stat Soc Ser C (Appl Stat) 46(4):433–448
Watson GS (1964) Smooth regression analysis. Sankhyā Indian J Stat Ser A 26:359–372
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ker, A.P., Liu, Y. Bayesian model averaging of possibly similar nonparametric densities. Comput Stat 32, 349–365 (2017). https://doi.org/10.1007/s00180-016-0700-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0700-4