Elsevier

Pattern Recognition

Volume 29, Issue 10, October 1996, Pages 1719-1736
Pattern Recognition

A data driven procedure for density estimation with some applications

https://doi.org/10.1016/0031-3203(96)00028-3Get rights and content

Abstract

This paper deals with the probability density estimation using a kernel-based approach where the window size of the kernel is found by a data-driven procedure. It is theoretically shown that, under certain assumptions, the estimated densities on bounded sets can be asymptotically unbiased when the width of window is obtained from the minimal spanning tree of the observed data. The theoretical development initially carried out on R2 is applicable to higher dimensional spaces. The results are experimentally verified on bounded sets with different types of distributions. The behaviour of the estimator in the case of the unbounded set as in that for Gaussian density is also experimentally seen to be good. Some applications of the proposed density estimation technique is demonstrated. One application is the representative point detection algorithm, which can be applied for data reduction and outlier rejection. Another application involves detection of border points of a dot pattern as well as finding a thinned version of the dot pattern.

References (35)

  • J.B. Kruskal

    On the shortest spanning subtree of a graph and the travelling salesman problem

  • B.B. Winter

    Rate of strong consistency of two non-parametric density estimators

    Ann. Statist.

    (1975)
  • I.J. Good et al.

    Nonparametric roughness penalties for probability densities

    Biometrika

    (1971)
  • R.L. Kashyap et al.

    Estimation of probability density and distribution function

    IEEE Trans. Inform. Theory

    (1968)
  • W. Wertz

    Statistical Density Estimation—a Survey

    (1978)
  • B.L.S. Prakasa Rao

    Nonparametric Functional Estimation

    (1983)
  • C.A. Murthy

    On consistent estimation of classes in S2 in the context of cluster analysis

  • Cited by (7)

    • Multivariate online kernel density estimation with Gaussian kernels

      2011, Pattern Recognition
      Citation Excerpt :

      Non-parametric methods such as Parzen kernel density estimators (KDE) [4–6] alleviate this problem by treating each observation as a component in the mixture model. There have been several studies on how to efficiently estimate the bandwidth of each component (e.g., [7–12]) and to incorporate the measurement noise into the estimated bandwidths, e.g., [13]. Several researchers have recognized the drawbacks of using same bandwidth for all components.

    • A novel multiseed nonhierarchical data clustering technique

      1997, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
    View all citing articles on Scopus
    View full text