ABSTRACT
Kernel density estimates are a robust way to reconstruct a continuous distribution from a discrete point set. Typically their effectiveness is measured either in L1 or L2 error. In this paper we investigate the challenges in using L∞ (or worst case) error, a stronger measure than L1 or L2. We present efficient solutions to two linked challenges: how to evaluate the L∞ error between two kernel density estimates and how to choose the bandwidth parameter for a kernel density estimate built on a subsample of a large data set.
Supplemental Material
- P. K. Agarwal, S. Har-Peled, H. Kaplan, and M. Sharir. Union of random minkowski sums and network vulnerability analysis. In SOCG, 2013. Google ScholarDigital Library
- C. C. Aggarwal. On density based transforms for uncertain data mining. In ICDE, 2007.Google ScholarCross Ref
- T. Bouezmarni and J. V. Rombouts. Nonparametric density estimation for multivariate bounded data. J. Statistical Planning and Inference, 140:139--152, 2010.Google ScholarCross Ref
- A. W. Bowman. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2):353--360, 1984.Google ScholarCross Ref
- M. J. Brewer. A bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10(4):299--309, 2000. Google ScholarDigital Library
- T. Brox, B. Rosenhahn, D. Cremers, and H.-P. Seidel. Nonparametric density estimation with adaptive, anisotropic kernels for human motion tracking. In Human Motion--Understanding, Modeling, Capture and Animation, pages 152--165. Springer, 2007. Google ScholarDigital Library
- P. B. Callahan and S. R. Kosaraju. Algorithms for dynamic closest-pair and -body potential fields. In SODA, 1995. Google ScholarDigital Library
- Y. Cao, H. He, H. Man, and X. Shen. Integration of self-organizing map (SOM) and kernel density estimation (KDE) for network intrusion detection. In SPIE Europe Security Defence, 2009.Google ScholarCross Ref
- Y. Chen, M. Welling, and A. Smola. Super-samples from kernel hearding. In UAI, 2010.Google Scholar
- M. S. de Lima and G. S. Atuncar. A bayesian method to estimate the optimal bandwidth for multivariate kernel estimator. Journal of Nonparametric Statistics, 23(1):137--148, 2011.Google ScholarCross Ref
- L. Devroye and L. Györfi. Nonparametric Density Estimation: The L1 View. Wiley, 1984.Google Scholar
- L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, 2001.Google ScholarCross Ref
- T. Duong et al. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in r. Journal of Statistical Software, 21(7):1--16, 2007.Google ScholarCross Ref
- T. Duong and M. L. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian J. of Stat., 32:485--506, 2005.Google ScholarCross Ref
- H. Edelsbrunner, B. T. Fasy, and G. Rote. Add isotropic Gaussian kernels at own risk: More and more resiliant modes in higher dimensions. SOCG, 2012. Google ScholarDigital Library
- A. Gangopadhyay and K. Cheung. Bayesian approach to choice of smoothing parameter in kernel density estimation. J. of Nonparam. Stat., 14:655--664, 2002.Google ScholarCross Ref
- J. Habbema, J. Hermans, and K. van den Broek. A stepwise discrimination analysis program using density estimation. Proc. in Computational Statistics, 1974.Google Scholar
- P. Hall, J. Marron, and B. U. Park. Smoothed cross-validation. Prob. The. and Rel. Fields, 92:1--20, 1992.Google ScholarCross Ref
- P. Hall and M. P. Wand. Minimizing L1 distance in nonparametric density estimation. Journal of Multivariate Analysis, 26(1):59--88, 1988. Google ScholarDigital Library
- S. Hu, D. S. Poskitt, and X. Zhang. Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions. CS&DA, 56:732--740, 2012. Google ScholarDigital Library
- M. Jones and S. Sheather. Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statistics & Probability Letters, 11:511--514, 1991.Google ScholarCross Ref
- J. Kiefer. Sequential minimax search for a maximum. Proc. Am. Mathematical Society, 4:502--506, 1953.Google ScholarCross Ref
- K. Kulasekera and W. Padgett. Bayes bandwidth selection in kernel density estimation with censored data. Nonparametric statistics, 18(2):129--143, 2006.Google ScholarCross Ref
- J. Marron and A. Tsybakov. Visual error criteria for qualitative smoothing. Journal of the American Statistical Association, 90(430):499--507, 1995.Google ScholarCross Ref
- A. Pérez, P. Larrañaga, and I. Inza. Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int. J. Approximate Reasoning, 50:341--362, 2009. Google ScholarDigital Library
- J. M. Phillips. eps-samples for kernels. SODA, 2013. Google ScholarDigital Library
- J. M. Phillips, B. Wang, and Y. Zheng. Geometric inference on kernel density estimates. In SoCG, 2015.Google Scholar
- M. Rudemo. Empirical choice of histograms and kernel density estimators. Scandin. J. of Stat., 9:65--78, 1982.Google Scholar
- S. R. Sain, K. A. Baggerly, and D. W. Scott. Cross-validation of multivariate densities. J. American Statistical Association, 89:807--817, 1994.Google ScholarCross Ref
- D. Scott, R. Tapia, and J. Thompson. Kernel density estimation revisited,. Nonlinear Analysis, Theory, Methods and Appplication, 1:339--372, 1977.Google ScholarCross Ref
- D. W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.Google ScholarCross Ref
- D. W. Scott and G. R. Terrell. Biased and unbiased cross-validation in density estimation. J. ASA, 82:1131--1146, 1987.Google Scholar
- B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.Google ScholarCross Ref
- G. R. Terrell. Maximal smoothing principle in density estimation. J. ASA, 85:470--477, 1990.Google Scholar
- E. Ullman. A theory of location for cities. American Journal of Sociology, pages 853--864, 1941.Google ScholarCross Ref
- M. Wand and M. Jones. Multivariate plug-in bandwidth selection. J. Comp. Stat, 9:97--116, 1994.Google Scholar
- C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, 2004.Google Scholar
- C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis. Improved fast Gauss transform and efficient kernel density estimation. In ICCV, 2003. Google ScholarDigital Library
- X. Zhang, M. L. King, and R. J. Hyndman. A bayesian approach to bandwidth selection for multivariate kernel density estimation. CS&DA, 50:3009--3031, 2006. Google ScholarDigital Library
- Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency in kernel density estimates for large data. In SIGMOD, 2013. Google ScholarDigital Library
Index Terms
- L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data
Recommendations
Quality and efficiency for kernel density estimates in large data
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataKernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. ...
QUAD: Quadratic-Bound-based Kernel Density Visualization
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataKernel density visualization, or KDV, is used to view and understand data points in various domains, including traffic or crime hotspot detection, ecological modeling, chemical geology, and physical modeling. Existing solutions, which are based on ...
Scalable Kernel Density Classification via Threshold-Based Pruning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataDensity estimation forms a critical component of many analytics tasks including outlier detection, visualization, and statistical testing. These tasks often seek to classify data into high and low-density regions of a probability distribution. Kernel ...
Comments