skip to main content
10.1145/2783258.2783357acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

Published:10 August 2015Publication History

ABSTRACT

Kernel density estimates are a robust way to reconstruct a continuous distribution from a discrete point set. Typically their effectiveness is measured either in L1 or L2 error. In this paper we investigate the challenges in using L (or worst case) error, a stronger measure than L1 or L2. We present efficient solutions to two linked challenges: how to evaluate the L error between two kernel density estimates and how to choose the bandwidth parameter for a kernel density estimate built on a subsample of a large data set.

Skip Supplemental Material Section

Supplemental Material

p1533.m4v

m4v

3.3 GB

References

  1. P. K. Agarwal, S. Har-Peled, H. Kaplan, and M. Sharir. Union of random minkowski sums and network vulnerability analysis. In SOCG, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal. On density based transforms for uncertain data mining. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Bouezmarni and J. V. Rombouts. Nonparametric density estimation for multivariate bounded data. J. Statistical Planning and Inference, 140:139--152, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. W. Bowman. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2):353--360, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. J. Brewer. A bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10(4):299--309, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Brox, B. Rosenhahn, D. Cremers, and H.-P. Seidel. Nonparametric density estimation with adaptive, anisotropic kernels for human motion tracking. In Human Motion--Understanding, Modeling, Capture and Animation, pages 152--165. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. B. Callahan and S. R. Kosaraju. Algorithms for dynamic closest-pair and -body potential fields. In SODA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Cao, H. He, H. Man, and X. Shen. Integration of self-organizing map (SOM) and kernel density estimation (KDE) for network intrusion detection. In SPIE Europe Security Defence, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. Y. Chen, M. Welling, and A. Smola. Super-samples from kernel hearding. In UAI, 2010.Google ScholarGoogle Scholar
  10. M. S. de Lima and G. S. Atuncar. A bayesian method to estimate the optimal bandwidth for multivariate kernel estimator. Journal of Nonparametric Statistics, 23(1):137--148, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Devroye and L. Györfi. Nonparametric Density Estimation: The L1 View. Wiley, 1984.Google ScholarGoogle Scholar
  12. L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. T. Duong et al. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in r. Journal of Statistical Software, 21(7):1--16, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. Duong and M. L. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian J. of Stat., 32:485--506, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Edelsbrunner, B. T. Fasy, and G. Rote. Add isotropic Gaussian kernels at own risk: More and more resiliant modes in higher dimensions. SOCG, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gangopadhyay and K. Cheung. Bayesian approach to choice of smoothing parameter in kernel density estimation. J. of Nonparam. Stat., 14:655--664, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Habbema, J. Hermans, and K. van den Broek. A stepwise discrimination analysis program using density estimation. Proc. in Computational Statistics, 1974.Google ScholarGoogle Scholar
  18. P. Hall, J. Marron, and B. U. Park. Smoothed cross-validation. Prob. The. and Rel. Fields, 92:1--20, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Hall and M. P. Wand. Minimizing L1 distance in nonparametric density estimation. Journal of Multivariate Analysis, 26(1):59--88, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Hu, D. S. Poskitt, and X. Zhang. Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions. CS&DA, 56:732--740, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Jones and S. Sheather. Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statistics & Probability Letters, 11:511--514, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Kiefer. Sequential minimax search for a maximum. Proc. Am. Mathematical Society, 4:502--506, 1953.Google ScholarGoogle ScholarCross RefCross Ref
  23. K. Kulasekera and W. Padgett. Bayes bandwidth selection in kernel density estimation with censored data. Nonparametric statistics, 18(2):129--143, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Marron and A. Tsybakov. Visual error criteria for qualitative smoothing. Journal of the American Statistical Association, 90(430):499--507, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Pérez, P. Larrañaga, and I. Inza. Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int. J. Approximate Reasoning, 50:341--362, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. M. Phillips. eps-samples for kernels. SODA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. M. Phillips, B. Wang, and Y. Zheng. Geometric inference on kernel density estimates. In SoCG, 2015.Google ScholarGoogle Scholar
  28. M. Rudemo. Empirical choice of histograms and kernel density estimators. Scandin. J. of Stat., 9:65--78, 1982.Google ScholarGoogle Scholar
  29. S. R. Sain, K. A. Baggerly, and D. W. Scott. Cross-validation of multivariate densities. J. American Statistical Association, 89:807--817, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Scott, R. Tapia, and J. Thompson. Kernel density estimation revisited,. Nonlinear Analysis, Theory, Methods and Appplication, 1:339--372, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  31. D. W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  32. D. W. Scott and G. R. Terrell. Biased and unbiased cross-validation in density estimation. J. ASA, 82:1131--1146, 1987.Google ScholarGoogle Scholar
  33. B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  34. G. R. Terrell. Maximal smoothing principle in density estimation. J. ASA, 85:470--477, 1990.Google ScholarGoogle Scholar
  35. E. Ullman. A theory of location for cities. American Journal of Sociology, pages 853--864, 1941.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Wand and M. Jones. Multivariate plug-in bandwidth selection. J. Comp. Stat, 9:97--116, 1994.Google ScholarGoogle Scholar
  37. C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, 2004.Google ScholarGoogle Scholar
  38. C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis. Improved fast Gauss transform and efficient kernel density estimation. In ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Zhang, M. L. King, and R. J. Hyndman. A bayesian approach to bandwidth selection for multivariate kernel density estimation. CS&DA, 50:3009--3031, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency in kernel density estimates for large data. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader