Skip to main content

Advertisement

Log in

Computationally Efficient Exact Calculation of Kernel Density Derivatives

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Machine learning research related to the derivatives of the kernel density estimator has received limited attention compared to the density estimator itself. This is despite of the general consensus that most of the important features of a data distribution, such as modes, curvature or even cluster structure, are characterized by its derivatives. In this paper we present a computationally efficient algorithm to calculate kernel density estimates and their derivatives for linearly separable kernels, with significant savings especially for high dimensional data and higher order derivatives. It significantly reduces the number of operations (multiplications and derivative evaluations) to calculate the estimates, while keeping results exact (i.e. no approximations are involved). The main idea is that the calculation of multivariate separable kernels and their derivatives, such as the gradient vector and the Hessian matrix involves significant number of redundant operations that can be eliminated using the chain rule. A tree-based algorithm that calculates exact kernel density estimate and derivatives in the most efficient fashion is presented with the particular focus being on optimizing kernel evaluations for individual data pairs. In contrast, most approaches in the literature resort to approximations of functions or downsampling. Overall computational savings of the presented method could be further increased by incorporating such approximations, which aim to reduce the number of pairs of data considered. The theoretical computational complexity of the tree-based and direct methods that perform all multiplications are compared. In experimental results, calculating separable kernels and their derivatives is considered, as well as a measure that evaluates how close a point is to the principal curve of a density, which employs first and second derivatives. These results indicate considerable improvement in computational complexity, hence time over the direct approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Arsalane, C. G. (2013). Kernel Estimator and Bandwidth Selection for Density and its Derivatives.

  2. Bache, K., & Lichman, M. (2013). UCI Machine Learning Repository.

  3. Bas, E., & Erdogmus, D. (2011). Connectivity of Projected High Dimensional Data Charts on One-Dimensional Curves. Signal Processing, 91(10), 2404–2409.

    Article  MATH  Google Scholar 

  4. Bas, E., & Erdogmus, D. (2011). Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference on, 2276–2279.

  5. Chacón, J. E., Duong, T., Wand, M.P. (2009). Asymptotics for General Multivariate Kernel Density DerivativeEstimators.

  6. Chacón, J. E., & Duong, T. (2013). Data-Driven Density Derivative Estimation, with Applications to Nonparametric Clustering and Bump Hunting. Electronic Journal of Statistics, 7, 499–532.

    Article  MATH  MathSciNet  Google Scholar 

  7. Chen, H., & Meer, P. (2002). Robust computer vision throughkernel density estimation. Computer Vision ECCV 2002, (pp. 236–250): Springer.

  8. Chen, Y., Welling, M., Smola, A. (2012). Super-samples from kernel herding. arXiv preprint arXiv:1203.3472.

  9. Comaniciu, D., & Peter, M. (2002). Mean shift: A Robust Approach Toward Feature Space Analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5), 603–619.

    Article  Google Scholar 

  10. Duong, T., Cowling, A., Koch, I., Wand, M. P. (2008). Feature Significance for Multivariate Kernel Density Estimation. Computational Statistics & Data Analysis, 52(9), 4225–4242.

    Article  MATH  MathSciNet  Google Scholar 

  11. Duong, T., & Hazelton, M. L. (2005). Convergence Rates for Unconstrained Bandwidth Matrix Selectors in Multivariate Kernel Density Sstimation. Journal of Multivariate Analysis, 93(2), 417–433.

    Article  MATH  MathSciNet  Google Scholar 

  12. Duong, T., & Hazelton, M. L. (2005). Cross-Validation Bandwidth Matrices for Multivariate Kernel Density Estimation. Scandinavian Journal of Statistics, 32(3), 485–506.

    Article  MATH  MathSciNet  Google Scholar 

  13. Engel, Y., Mannor, S., Meir, R. (2004). The Kernel Recursive Least-Squares Algorithm. Signal Processing, IEEE Transactions on, 52(8), 2275–2285.

    Article  MathSciNet  Google Scholar 

  14. Fan, J., & Marron, James S (1994). Fast Implementations of Nonparametric Curve Estimators. Journal of Computational and Graphical Statistics, 3(1), 35–56.

    Google Scholar 

  15. Greengard, L., & Strain, J. (1991). The Fast Gauss Transform. SIAM Journal on Scientific and Statistical Computing, 12(1), 79–94.

    Article  MATH  MathSciNet  Google Scholar 

  16. Hofmann, T., Schölkopf, B., Smola, A.J. (2008). Kernel Methods in Machine Learning. The annals of statistics, 1171–1220.

  17. Kleinberg, J., & Tardos, E. (2006). Algorithm design Pearson Education.

  18. Myhre, J. N., & Jenssen, R. (2012). Mixture weight influence on kernel entropy component analysis and semi-supervised learning using the lasso.In Machine Learning for Signal Processing (MLSP). 2012 IEEE International Workshop on, 1–6.

  19. Ozertem, U., & Erdogmus, D. (2011). Locally Defined Principal Curves and Surfaces. Journal of Machine Learning Research, 12(4), 1249–1286.

  20. Park, B.U., & Marron, J. S. (1990). Comparison of Data-Driven Bandwidth Selectors. Journal of the American Statistical Association, 85(409), 66–72.

    Article  Google Scholar 

  21. Raykar, V. C., Duraiswami, R., Zhao, L. H. (2010). Fast Computation of Kernel Estimators. Journal of Computational and Graphical Statistics, 19(1), 205–220.

    Article  MathSciNet  Google Scholar 

  22. Scott, D. W., & Sheather, S. J. (1985). Kernel Density Estimation with Binned Data. Communications in Statistics-Theory and Methods, 14(6), 1353–1359.

    Article  Google Scholar 

  23. Sheather, S. J., & Jones, M. C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statistical Society. Series B (Methodological), 683–690.

  24. Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman & Hall/CRC, 26.

  25. Silverman, B. W. (1982). Algorithm as 176: Kernel Density Estimation Using the Fast Fourier Transform. Journal of the Royal Statistical Society. Series C (Applied Statistics), 31(1), 93–99.

    MATH  Google Scholar 

  26. Simon, H. (1994). Neural Networks: A Comprehensive Foundation,Prentice Hall PTR.

  27. Singh, R.S. (1977). Applications of Estimators of a Density and its Derivatives to Certain Statistical Problems. Journal of the Royal Statistical Society. Series B (Methodological), 357–363.

  28. Taylor, J. S., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis: Cambridge University press.

  29. Teofilo, F.G. (1985). Clustering to Minimize the Maximum Intercluster Distance. Theoretical Computer Science, 38, 293–306.

    Article  MATH  MathSciNet  Google Scholar 

  30. Wand, M.P., & Jones, M. C. (1994). Kernel Smoothing. Chapman & Hall/CRC, 60.

  31. Williams, C., & Seeger, M. (2001). Using the Nyström Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems 13. Citeseer.

  32. Yang, C., Duraiswami, R., Gumerov, N. A., Davis, L. (2003). Improved fast gauss transform and efficient kernel density estimation.In: Ninth IEEE Proceedings International Conference on Computer Vision, 2003, (pp. 664–671).

Download references

Acknowledgments

This work is supported by NSF under grants IIS-1149570, IIS-1118061, and NIH grant 1R21EY022387.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matineh Shaker.

Additional information

M. Shaker and J. N. Myhre contributed equally to this work.

Appendices

Appendix A:

Proof of fact 1

Subject to the constraint \(\sum _{l} \left | v_{lm} \right | \leq p\) on the growth of T, the number of nodes in the l-th layer is equal to \(\binom {l+p}{p}\). The proof of this is based on mathematical induction for all l=1,…,d. Let the number of nodes in each layer be n l .

Base case

When l=1, the first dimension contains all the derivative orders j=0,…,p which makes n 1=p+1. The nodes with index j at the first layer are expanded by p+1−j child nodes in the second layer, and therefore \(n_{2}=(p+1)+p+\dots +2+1=\binom {2+p}{2}\) which supports the statement for l=2.

Induction step

Let l be given as the layer number, and suppose the hypothesis is true for l=l . Using Pascal’s rule about binomial coefficients, n l +1 can be written as

$$ \binom{l'+1+p}{p}=\binom{l'+p}{p}+\binom{l'+p}{p-1} $$
(19)

where the left term is the number of nodes at the layer l +1. Looking back at the tree structure and Figure 1 as an example, it can be observed that the tree has a self-repeating pattern. At the new layer l +1, the nodes are divided into two parts. The upper part is originally expanded from v 11, and it exactly repeats the nodes of previous layer. The reason is that the starting node of this branch is the node containing j=0 at the first layer, which does not change the maximum derivative order constraint. As a result, the branch can be shifted one layer back to the root, and therefore this recursive nature propagates to the whole tree. In (19), this branch is represented by the term \(\binom {l'+p}{p}\) which is equal to n l , according to the inductionhypothesis.

The lower part of the tree is expanded from all other nodes of the first layer expect v 11. One can observe that these branches are identical to those of a tree with the same number of layers but with the maximum derivative order of p−1. Again the reasoning is based on the maximum derivative constraint. Since at the first layer the node value is nonzero, the nodes have one less derivative order for the valid expansion. Based on the induction hypothesis, the number of nodes in the layer l +1 subject to the constraint \(\sum _{l} \left | v_{lm} \right | \leq (p-1)\), will be \(\binom {(l'+1)+(p-1)}{p-1}\) or \(\binom {l'+p}{p-1}\), and equal to the second term of the the right side of (19). Thus the induction hypothesis is proved for l=l +1, and the statement is generally true.

Appendix B:

In the case of multivariate separable Gaussian kernels, the exponent has the Mahalanobis distance form. Therefore, decomposing the inverse of covariance matrix (precision matrix) gives the matrix that scales and orients the input data relative to the desired kernel bandwidth.

$$ \mathbf{\Sigma}_{i}^{-1} = \mathbf{S}_{i}^{T} \mathbf{S}_{i}. $$
(20)

In order to extract S i , the eigendecomposition of C i is utilized

$$ \mathbf{\Sigma}_{i} = \mathbf{Q}_{i} \mathbf{\Lambda}_{i} \mathbf{Q}_{i}^{T}, $$
(21)

where the eigenvectors and eigenvalues of C i are the columns and diagonal elements of orthogonal Q i and diagonal Λ i , respectively. From (20) and (21), the scale matrix S i is written as follows

$$ \mathbf{S}_{i} = \mathbf{\Lambda}_{i}^{-1/2} \mathbf{Q}_{i}^{T}. $$
(22)

Using this formation, S i is the bandwidth matrix affiliated with X i . Similarly, a Cholesky factorization of \(\mathbf {C}_{i} =\mathbf {R}^{T}_{i}\mathbf {R}_{i}\) would yield S i =R i as an option.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shaker, M., Myhre, J.N. & Erdogmus, D. Computationally Efficient Exact Calculation of Kernel Density Derivatives. J Sign Process Syst 81, 321–332 (2015). https://doi.org/10.1007/s11265-014-0904-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0904-1

Keywords