Skip to main content
Log in

Saddlepoint approximations to P-values for comparison of density estimates

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

This article is concerned with computing approximate p-values for the maximum of the absolute difference between kernel density estimates. The approximations are based on treating the process of local extrema of the differences as a nonhomogeneous Poisson Process and estimating the corresponding local intensity function. The process of local extrema is characterized by the intensity function, which determines the rate of local extrema above a given threshold. A key idea of this article is to provide methods for more accurate estimation of the intensity function by using saddlepoint approximations for the joint density of the difference between kernel density estimates and using the first and second derivative of the difference. In this article, saddlepoint approximations are compared to gaussian approximations. Simulation results from saddlepoint approximations show consistently better agreement between empirical p-value and predetermined value with various bandwidths of kernel density estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.
Figure 3.
Figure 4.

Similar content being viewed by others

References

  • Aldous, D. (1989), ‘Probability Approximations via the Poisson Clumping Heuristic’, Springer-Verlag, in New York

    Book  Google Scholar 

  • Barndorff-Nielsen, O., and Cox, D. R. (1979), ‘Edgeworth and Saddle-point Approximations With Statistical Applications (with discussion)’,Journal of the Royal Statistical Society, Ser. B,41, 279–312.

    MathSciNet  MATH  Google Scholar 

  • Cacoullos, T. (1966), ‘Estimation of a Multivariate Density’,Annals of the Institute of Statistical Mathematics,18, 178–189.

    Article  MathSciNet  Google Scholar 

  • Daniels, H. E. (1954), ‘Saddlepoint Approximations in Statistics’, Annals of Mathematical Statistic,25, 614–649.

    Article  MathSciNet  Google Scholar 

  • Davison, A. C. (1988), ‘Approximate Conditional Inference in Generalized Linear Models’,J.R.Statist. Soc. B 50, 445–461.

    MathSciNet  Google Scholar 

  • McCuIlagh, P. (1987), ‘Tensor methods in statistics’, London:Chapman and Hall.

    Google Scholar 

  • Minnotte, M. C., and Scott, D. W. (1993), ‘The Mode Tree: A Tool for Visualization of Nonparametric Density Features’J. Comp. Graph. Stat.,2, 51–68.

    Google Scholar 

  • Rabinowitz, D. (1994), ‘Detecting clusters in disease incidence’,Change-point Problems (Edited by E. Carlstein, H.-G. Muller and D. Siegmund), 255–275. IMS, Hayward California.

    Chapter  Google Scholar 

  • Rabinowitz, D. and Siegmund, D. (1997), ‘The Approximate Distribution of the Maximum of a Smoothed Poisson Random Field’,Statistica Sinica 7, 167–180.

    MathSciNet  MATH  Google Scholar 

  • Siegmund, D. (1986), ‘Boundary Crossing Probabilities and Statistical Applications’,Annals of Statistics,14, 361–404.

    Article  MathSciNet  Google Scholar 

  • Silverman, B. (1986), ‘Density Estimation for Statistics and Data Analysis’, Chapman and Hall, New York.

    Book  Google Scholar 

Download references

Acknowledgement

This work was supported in part by a grant R03-2002-000-00034-0 from Korea Sciene and Engineering Foundation (KOSEF) in 2002–2004. This paper was derived from the author’s Ph. D. dissertation at Columbia University completed under the supervision of Dr. Daniel Rabinowitz.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoi-Jeong Lim.

Appendix

Appendix

A-1.

$$\begin{array}{*{20}{c}} {E\Delta \left( t \right) = E\left\{ {\frac{1}{n}\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right) - \frac{1}{m}\sum\limits_{j = 1}^m {w\left( {{y_j} - t} \right)} } } \right\}} \\\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; { = \frac{1}{n} \cdot n \cdot E\left( {w\left( {{x_i} - t} \right)} \right) - \frac{1}{m} \cdot m \cdot E\left( {w\left( {{y_j} - t} \right)} \right)} \\\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; { = \frac{1}{n} \cdot n \cdot E\left( {w\left( {{u_i}} \right)} \right) - \frac{1}{m} \cdot E \cdot \left( {w\left( {{u_i}} \right)} \right) = \mu - \mu = 0} \end{array}$$

Since u1,u2,...,un+m all have the same distribution under the null hypothesis, with mean

$$\mu = \frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w\left( {{u_i}} \right)} $$
$$\begin{array}{*{20}{c}} {E\Delta '\left( t \right) = E\frac{\partial }{{\partial t}}\left\{ {\frac{1}{n}\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right) - \frac{1}{m}\sum\limits_{j = 1}^m {w\left( {{y_j} - 1} \right)} } } \right\}} \\\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\ { = \frac{1}{n} \cdot n \cdot E\left( {w'\left( {{x_i} - t} \right)} \right) - \frac{1}{m} \cdot m \cdot E\left( {w'\left( {{y_j} - t} \right)} \right)} \\\:\:\:\:\:\:\:\:\: { = \frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w'\left( {{u_i}} \right) - \frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w'\left( {{u_i}} \right) = 0} } } \end{array}$$

In a similar way, E∆″(t) = 0 can be proved.

A-2.

$$\begin{array}{*{20}{l}} {\operatorname{var} \Delta \left( t \right) = E{\Delta ^2}\left( t \right) = E{{\left\{ {\frac{1}{n}\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right) - \frac{1}{m}\sum\limits_{j = 1}^m {w\left( {{y_j} - t} \right)} } } \right\}}^2}} \\\;\;\;\;\;\;\;\;\;\;\;\;\;\; { = E\left\{ {\frac{1}{{{n^2}}}{{\left( {\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right)} } \right)}^2} + \frac{1}{{{m^2}}}{{\left( {\sum\limits_{j = 1}^m {w\left( {{y_j} - t} \right)} } \right)}^2} - \frac{2}{{nm}}\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right)\sum\limits_{j = 1}^m {w\left( {{y_j} - t} \right)} } } \right\}} \\\;\;\;\;\;\;\;\;\;\;\;\;\;\; { = E\left\{ {\frac{1}{{{n^2}}} + \left[ {\sum\limits_{i = 1}^n {w{{\left( {{x_i} - t} \right)}^2} + \sum\limits_{i = 1}^n {\sum\limits_{j \ne i} {w\left( {{x_i} - t} \right)w\left( {{x_j} - t} \right)} } } } \right] + \frac{1}{{{m^2}}}\left[ {\sum\limits_{j = 1}^m {w{{\left( {{y_j} - t} \right)}^2}} + \sum\limits_{j = 1}^m {\sum\limits_{i \ne j} {w\left( {{y_j} - t} \right)w\left( {{y_i} - t} \right)} } } \right] - \frac{2}{{nm}}\sum\limits_{i = 1}^n {w\left( {{x_i} - t} \right)\sum\limits_{j = 1}^m {w\left( {{y_j} - t} \right)} } } \right\}} \\\;\;\;\;\;\;\;\;\;\;\;\;\;\; { = \frac{1}{{{n^2}}}nEw{{\left( {{u_i}} \right)}^2} + \frac{1}{{{n^2}}}n\left( {n - 1} \right)Ew\left( {{u_i}} \right)Ew\left( {{u_j}} \right) + \frac{1}{{{m^2}}}mEw{{\left( {{u_i}} \right)}^2} + \frac{{m\left( {m - 1} \right)}}{{{m^2}}} \times Ew\left( {{u_i}} \right)Ew\left( {{u_j}} \right) - \frac{2}{{nm}}nmEw\left( {{u_i}} \right)Ew\left( {{u_j}} \right)\quad \quad \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\left( {i \ne j} \right)} \end{array}$$

Since all realizations of u are equally likely,

$$E\frac{n}{{i = 1}}w\left( {{u_i}} \right) = nEw\left( {{u_i}} \right)$$
$$ = \left( {\frac{1}{n} + \frac{1}{m}} \right)\frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w{{\left( {{u_i}} \right)}^2}} + \left( {\frac{{n - 1}}{n} + \frac{{m - 1}}{m} - 2} \right)\frac{1}{{n + m}}\frac{1}{{n + m - 1}} \times \sum\limits_{i = 1}^{n + m} {\sum\limits_{j \ne i} {w\left( {{u_i}} \right)w\left( {{u_j}} \right)} } $$

Since

$$Ew{\left( {{u_i}} \right)^2} = \frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w{{\left( {{u_i}} \right)}^2}} $$

, After a little more arithmetic manipulation, using the relation

$$\sum\limits_{i = 1}^n w \left( {{x_i} - t} \right){\sum\limits_{j = 1}^n {w\left( {{x_j} - t} \right) = \sum\limits_{i = 1}^n w \left( {{x_i} - t} \right)} ^2} + \sum\limits_{i = 1}^n {\sum\limits_{j \ne i} {w\left( {{x_i} - t} \right)w\left( {{x_j} - t} \right)} } $$

\(\operatorname{var} \Delta \left( t \right)\) is given by

$$\left( {\frac{1}{n} + \frac{1}{m}} \right)\left\{ {\frac{1}{{n + m}}\sum\limits_{i = 1}^{n + m} {w{{\left( {{u_i}} \right)}^2} - {{\left( {\frac{{\sum\limits_{i = 1}^{n + m} {w\left( {{u_i}} \right)} }}{{n + m}}} \right)}^2}} } \right\}\frac{{n + m}}{{n + m - 1}}.$$

The C code including the module for Appendix A1 and A2 is available upon request.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, HJ. Saddlepoint approximations to P-values for comparison of density estimates. Computational Statistics 20, 31–50 (2005). https://doi.org/10.1007/BF02736121

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02736121

Keywords