research-article

L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

Authors:
Yan Zheng

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

,
Jeff M. Phillips

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 1533–1542https://doi.org/10.1145/2783258.2783357

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1533–1542

ABSTRACT

Kernel density estimates are a robust way to reconstruct a continuous distribution from a discrete point set. Typically their effectiveness is measured either in L₁ or L₂ error. In this paper we investigate the challenges in using L_∞ (or worst case) error, a stronger measure than L₁ or L₂. We present efficient solutions to two linked challenges: how to evaluate the L_∞ error between two kernel density estimates and how to choose the bandwidth parameter for a kernel density estimate built on a subsample of a large data set.

Supplemental Material

p1533.m4v

m4v

3.3 GB

Download

References

P. K. Agarwal, S. Har-Peled, H. Kaplan, and M. Sharir. Union of random minkowski sums and network vulnerability analysis. In SOCG, 2013. Google ScholarDigital Library
C. C. Aggarwal. On density based transforms for uncertain data mining. In ICDE, 2007.Google ScholarCross Ref
T. Bouezmarni and J. V. Rombouts. Nonparametric density estimation for multivariate bounded data. J. Statistical Planning and Inference, 140:139--152, 2010.Google ScholarCross Ref
A. W. Bowman. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2):353--360, 1984.Google ScholarCross Ref
M. J. Brewer. A bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10(4):299--309, 2000. Google ScholarDigital Library
T. Brox, B. Rosenhahn, D. Cremers, and H.-P. Seidel. Nonparametric density estimation with adaptive, anisotropic kernels for human motion tracking. In Human Motion--Understanding, Modeling, Capture and Animation, pages 152--165. Springer, 2007. Google ScholarDigital Library
P. B. Callahan and S. R. Kosaraju. Algorithms for dynamic closest-pair and -body potential fields. In SODA, 1995. Google ScholarDigital Library
Y. Cao, H. He, H. Man, and X. Shen. Integration of self-organizing map (SOM) and kernel density estimation (KDE) for network intrusion detection. In SPIE Europe Security Defence, 2009.Google ScholarCross Ref
Y. Chen, M. Welling, and A. Smola. Super-samples from kernel hearding. In UAI, 2010.Google Scholar
M. S. de Lima and G. S. Atuncar. A bayesian method to estimate the optimal bandwidth for multivariate kernel estimator. Journal of Nonparametric Statistics, 23(1):137--148, 2011.Google ScholarCross Ref
L. Devroye and L. Györfi. Nonparametric Density Estimation: The L₁ View. Wiley, 1984.Google Scholar
L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, 2001.Google ScholarCross Ref
T. Duong et al. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in r. Journal of Statistical Software, 21(7):1--16, 2007.Google ScholarCross Ref
T. Duong and M. L. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian J. of Stat., 32:485--506, 2005.Google ScholarCross Ref
H. Edelsbrunner, B. T. Fasy, and G. Rote. Add isotropic Gaussian kernels at own risk: More and more resiliant modes in higher dimensions. SOCG, 2012. Google ScholarDigital Library
A. Gangopadhyay and K. Cheung. Bayesian approach to choice of smoothing parameter in kernel density estimation. J. of Nonparam. Stat., 14:655--664, 2002.Google ScholarCross Ref
J. Habbema, J. Hermans, and K. van den Broek. A stepwise discrimination analysis program using density estimation. Proc. in Computational Statistics, 1974.Google Scholar
P. Hall, J. Marron, and B. U. Park. Smoothed cross-validation. Prob. The. and Rel. Fields, 92:1--20, 1992.Google ScholarCross Ref
P. Hall and M. P. Wand. Minimizing L₁ distance in nonparametric density estimation. Journal of Multivariate Analysis, 26(1):59--88, 1988. Google ScholarDigital Library
S. Hu, D. S. Poskitt, and X. Zhang. Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions. CS&DA, 56:732--740, 2012. Google ScholarDigital Library
M. Jones and S. Sheather. Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statistics & Probability Letters, 11:511--514, 1991.Google ScholarCross Ref
J. Kiefer. Sequential minimax search for a maximum. Proc. Am. Mathematical Society, 4:502--506, 1953.Google ScholarCross Ref
K. Kulasekera and W. Padgett. Bayes bandwidth selection in kernel density estimation with censored data. Nonparametric statistics, 18(2):129--143, 2006.Google ScholarCross Ref
J. Marron and A. Tsybakov. Visual error criteria for qualitative smoothing. Journal of the American Statistical Association, 90(430):499--507, 1995.Google ScholarCross Ref
A. Pérez, P. Larrañaga, and I. Inza. Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int. J. Approximate Reasoning, 50:341--362, 2009. Google ScholarDigital Library
J. M. Phillips. eps-samples for kernels. SODA, 2013. Google ScholarDigital Library
J. M. Phillips, B. Wang, and Y. Zheng. Geometric inference on kernel density estimates. In SoCG, 2015.Google Scholar
M. Rudemo. Empirical choice of histograms and kernel density estimators. Scandin. J. of Stat., 9:65--78, 1982.Google Scholar
S. R. Sain, K. A. Baggerly, and D. W. Scott. Cross-validation of multivariate densities. J. American Statistical Association, 89:807--817, 1994.Google ScholarCross Ref
D. Scott, R. Tapia, and J. Thompson. Kernel density estimation revisited,. Nonlinear Analysis, Theory, Methods and Appplication, 1:339--372, 1977.Google ScholarCross Ref
D. W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.Google ScholarCross Ref
D. W. Scott and G. R. Terrell. Biased and unbiased cross-validation in density estimation. J. ASA, 82:1131--1146, 1987.Google Scholar
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.Google ScholarCross Ref
G. R. Terrell. Maximal smoothing principle in density estimation. J. ASA, 85:470--477, 1990.Google Scholar
E. Ullman. A theory of location for cities. American Journal of Sociology, pages 853--864, 1941.Google ScholarCross Ref
M. Wand and M. Jones. Multivariate plug-in bandwidth selection. J. Comp. Stat, 9:97--116, 1994.Google Scholar
C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, 2004.Google Scholar
C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis. Improved fast Gauss transform and efficient kernel density estimation. In ICCV, 2003. Google ScholarDigital Library
X. Zhang, M. L. King, and R. J. Hyndman. A bayesian approach to bandwidth selection for multivariate kernel density estimation. CS&DA, 50:3009--3031, 2006. Google ScholarDigital Library
Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency in kernel density estimates for large data. In SIGMOD, 2013. Google ScholarDigital Library

Index Terms

L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data
1. Theory of computation
  1. Semantics and reasoning
    1. Program constructs
      1. Control primitives

Recommendations

Quality and efficiency for kernel density estimates in large data
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. ...
Read More
QUAD: Quadratic-Bound-based Kernel Density Visualization
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Kernel density visualization, or KDV, is used to view and understand data points in various domains, including traffic or crime hotspot detection, ecological modeling, chemical geology, and physical modeling. Existing solutions, which are based on ...
Read More
Scalable Kernel Density Classification via Threshold-Based Pruning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Density estimation forms a critical component of many analytics tasks including outlier detection, visualization, and statistical testing. These tasks often seek to classify data into high and low-density regions of a probability distribution. Kernel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bandwidth selection
coresets
kernel density estimates
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 320
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Quality and efficiency for kernel density estimates in large data

QUAD: Quadratic-Bound-based Kernel Density Visualization

Scalable Kernel Density Classification via Threshold-Based Pruning