Sparse probability density function estimation using the minimum integrated square error

doi:10.1016/j.neucom.2013.02.003

Neurocomputing

Volume 115, 4 September 2013, Pages 122-129

https://doi.org/10.1016/j.neucom.2013.02.003 Get rights and content

Abstract

We develop a new sparse kernel density estimator using a forward constrained regression framework, within which the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Our main contribution is to derive a recursive algorithm to select significant kernels one at time based on the minimum integrated square error (MISE) criterion for both the selection of kernels and the estimation of mixing weights. The proposed approach is simple to implement and the associated computational cost is very low. Specifically, the complexity of our algorithm is in the order of the number of training data N, which is much lower than the order of N² offered by the best existing sparse kernel density estimators. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with comparable accuracy to those of the classical Parzen window estimate and other existing sparse kernel density estimators.

Introduction

The finite mixture model [1] is a general approach to the probability density function (PDF) estimation problem that is fundamental to many pattern recognition, data analysis and other engineering applications [2], [3], [4], [5], [6], [7]. The celebrated Parzen window (PW) estimate [8] can be regarded as a special case of the finite mixture model, in which the number of mixtures is equal to that of the training data samples and all the mixing weights are equal. However, the point density estimate using the PW estimator for a future data sample can be computationally expensive if the number of training data samples is very large. Much of the existing works in the fitting of a finite mixture model are based on fixing the number of mixtures and applying the expectation-maximisation (EM) algorithm [9] to provide the maximum likelihood (ML) estimate of the mixture model's parameters. This associated ML optimisation, in general, is a highly nonlinear optimisation process requiring extensive computation, but for the Gaussian mixture model, the EM algorithm can be derived in an explicit iterative form [10]. However, this EM algorithm based ML estimation is well known to be ill posed and has a slow convergence speed, and to tackle the associated numerical difficulties, it is often required to apply resampling techniques [11], [12], [13], [14]. In general, the correct number of mixture components is unknown, and simultaneously determining the required number of mixture components and estimating the associated parameters of the finite mixture model is a challenging problem. Hence it is highly desirable to develop new methods of fitting a finite mixture model with the capability to infer a minimum number of mixtures from the data automatically and efficiently.

There is a considerable interest into research on the sparse PDF estimation. The support vector machine (SVM) density estimation technique has been proposed [15], [16], in which the density estimation problem is formulated as a supervised learning mode whilst the mean absolute deviation between the empirical cumulative distribution function (CDF) calculated from the training data and the CDF based on the PDF estimator also calculated from the training data are minimised. The optimisation in the SVM method is to solve a constrained quadratic optimisation problem. This yields the sparsity inducing property, i.e. at the optimality, many kernels' weights are driven to zeros. Alternatively a novel regression-based PDF estimation method has been introduced [17], in which the empirical CDF is constructed, in the same manner as in the SVM density estimation approach, to be used as the desired response. The orthogonal forward regression (OFR) approach is an efficient supervised regression model construction method [18]. In order to automatically determine the model structure with the improved model generalisation, the OFR method has been combined with a leave-one-out test score and local regularisation [19], [20]. The regression-based idea of [17] and the approach in [19], [20] have been extended to yield a new OFR based sparse density estimation algorithm [21] which is capable of automatically constructing very sparse kernel density estimate, with comparable performance to that of the PW estimate. In [17], [21], the regressors are the CDFs of the kernels and the target response is the empirical CDF. The calculation of CDFs becomes inconvenient and difficult for many types of kernels whose corresponding CDFs are difficult to compute. A simple and viable alternative approach has been proposed to use kernels directly as regressors by adopting the PW estimate as the target response [22].

The desirable property of sparsity inducing also happens in the interesting approach of reduced set density estimator (RSDE) [23]. The RSDE is different from the SVM in that it is based on the minimisation of the integrated square error (ISE) between the estimator and the true density. The minimum integrated square error (MISE) is a classical goodness of fit criterion for probability density estimation [2], [23], [24]. The optimisation problem in RSDE is a constrained quadratic optimisation one, and two efficient optimisation algorithms of both multiplicative updating of the weighting coefficients and sequential minimisation optimisation were introduced for the RSDE that has a complexity of $O (N^{2})$ per iteration, where N is the number of data samples and $O (M)$ denotes the order of M, compared to a standard quadratic optimisation solver at $O (N^{3})$ . Note that the RSDE is mainly restricted to using the Gaussian kernel, but the sparse density estimators of [21], [22] do not have this restriction. The complexity of the sparse density estimators [21], [22] is also $O (N^{2})$ scaled by the number of regressors selected, which is generally very small. Our extensive experience has shown that all the sparse density estimators [15], [16], [21], [22], [23] discussed here are capable of automatically producing sparse PDF estimates with comparable performance to that of the PW estimate, but the density estimators of [21], [22], [23] produce much sparser estimates than the SVM based density estimator.

Against this background, this paper introduces a new algorithm for sparse kernel density estimation based on the MISE and the forward constrained regression (FCR) [25]. In our proposed new sparse kernel density estimator, referred to as the FCR-MISE algorithm, a kernel term is selected one at a time which has the minimum ISE value among all the candidate kernels formed from the data points. Within the FCR framework, the mixing weights are computed using a recursion linking the weight for the newly selected kernel and the set of the mixing weights of the previous stages [25]. Thus the parameter estimation problem is reduced to a one-dimensional one, which is shown to have a closed-form solution using the MISE criterion. The proposed density estimation algorithm is very efficient due to the recursive computation and the closed-form solution of only one parameter per step. Specifically, the complexity of our proposed new algorithm is $O (N)$ scaled by the squared number of kernels selected. Numerical examples are employed to demonstrate that our new sparse kernel density estimator is capable of producing very sparse PDF estimates with comparable accuracy to those of the PW estimator and other existing sparse kernel density estimators.

The paper is organised as follows. Section 2 introduces the idea of sparse kernel density estimator construction via the FCR framework. Section 3 proposes the new algorithm of joint kernel selection and mixing weight estimation based on the MISE and FCR. Numerical experiments are utilised to illustrate the effectiveness of the proposed algorithm in Section 4 and our conclusions are given in Section 5.

Section snippets

Sparse kernel density estimator construction via forward constrained regression

Given the finite data set $D_{N} = {x_{j}}_{j = 1}^{N}$ consisting of N data samples, where the data vector $x_{j} \in R^{m}$ follows an unknown PDF $p (x)$ , the problem under study is to find a sparse approximation of $p (x)$ based on D_N. A general kernel based density estimate of $p (x)$ is given by ${\hat{p}}^{(N)} (x; β_{N}, ρ) = \sum_{j = 1}^{N} β_{j} K_{ρ} (x, x_{j})$ subject to $β_{j} \geq 0, 1 \leq j \leq N, and β_{N}^{T} 1_{N} = 1,$ where $β_{j} s$ are the kernel weights, $β_{N} = [β_{1} β_{2} \dots β_{N}]^{T}$ , and $1_{N}$ is the N-dimensional vector whose elements are all equal to one, while $K_{ρ} (x, x_{j})$ is a chosen kernel function with the

Joint kernel selection and weight estimation based on the MISE

The MISE between a PDF estimator and the true density is a classical goodness of fit criterion for both nonparametric density estimation [2], [23] and parametric density estimation [24]. In the following, we introduce a new algorithm integrating the kernel term selection and the kernel weight estimation based on the MISE measure, within the general FCR framework described in the previous section. More specifically, the joint kernel selection and weight estimation at the lth forward selection

Simulation study

The first two examples are pure PDF estimation examples. In each of these two examples, a data set of N samples was randomly drawn from a distribution $p (x)$ and used to construct the PDF estimator ${\hat{p}}^{(s)} (x; β_{s}, ρ)$ using the proposed FCR-MSIE approach. A separate test data set of $N_{test} = 10 000$ samples was used for evaluating the density estimate according to the L₁ test error $L_{1} = \frac{1}{N_{test}} \sum_{k = 1}^{N_{test}} | p (x_{k}) - {\hat{p}}^{(s)} (x_{k}; β_{s}, ρ) | .$ The experiment was repeated for 100 different random runs. The benchmark PDF estimators

Conclusions

In this paper, a new sparse kernel density estimator has been derived using the forward constrained regression procedure. Our novel contribution is to derive a recursive algorithm which selects significant kernels one at time based on the minimum integrated square error criterion within the FCR procedure. The most significant advantage of our proposed FCR-MISE approach is that it has an extremely low computational complexity, since at each FCR step, only a single parameter is estimated using a

Acknowledgements

X. Hong acknowledges the support of the UK EPSRC.

The authors acknowledge the financial support of King Abdulaziz University (Grant No. 1-4-1432/HiCi).

Xia Hong received her B.Sc. and M.Sc. degrees from the National University of Defence Technology, China, in 1984 and 1987, respectively, and her Ph.D. degree from the University of Sheffield, UK, in 1998, all in automatic control.

References (30)

Z.R. Yang et al.
Robust maximum likelihood training of heteroscedastic probabilistic neural networks
Neural Networks
(1998)
M. Svensén et al.
Robust Bayesian mixture modelling
Neurocomputing
(2005)
C. Archambeau et al.
Robust Bayesian clustering
Neural Networks
(2007)
S. Chen et al.
An orthogonal forward regression techniques for sparse kernel density estimation
Neurocomputing
(2008)
G.J. McLachlan et al.
Finite Mixture Models
(2000)
B.W. Silverman
Density Estimation for Statistics and Data Analysis
(1986)
R.O. Duda et al.
Pattern Classification and Scene Analysis
(1973)
C.M. Bishop
Neural Networks for Pattern Recognition
(1995)
H. Wang
Robust control of the output probability density functions for multivariable stochastic systems with guaranteed stability
IEEE Trans. Autom. Control
(1999)
S. Chen et al.
Adaptive minimum-BER linear multiuser detection for DS-CDMA signals in multipath channels
IEEE Trans. Signal Process.
(2001)

S. Chen et al.

Particle swarm optimization aided orthogonal forward regression for unified data modelling

IEEE Trans. Evol. Comput.

(2010)

E. Parzen

On estimation of a probability density function and mode

Ann. Math. Stat.

(1962)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

J. R. Stat. Soc. B

(1977)

J.A. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and...

B. Efron et al.

An Introduction to Bootstrap

(1993)

Cited by (14)

A novel hybrid approach based-SRG model for vehicle position prediction in multi-GPS outage conditions
2018, Information Fusion
Trajectory prediction in autonomous driving system is an important aspect for preventing for instance the multi-vehicle collision. However, predicting accurately the future location of a vehicle is still a delicate task especially in intelligent transport systems. This paper proposes a hybrid approach of solving the position prediction problem of vehicle in multi-GPS outage conditions such as free and partial as well as short and long complete GPS outages. The proposed approach aggregates the advantages of both fuzzy inference system (FIS) and sparse random Gaussian (SRG) models, consequently named FIS-SRG, leading to a significant decrease in position prediction error of vehicle. The aforementioned outages are defined by adjusting the GPS propagation weight monitored by the Gaussian model and updated by fuzzy logic system. Experimental results based on data from GPS and INS and the comparison study with the existing prediction methods illustrate the good performance of the proposed approach, in all considered GPS outage conditions.
Sparse density estimator with tunable kernels
2016, Neurocomputing
Citation Excerpt :
However the property that (18) is quadratic in λl cannot be exploited for computational advantage. Six methods were used for comparison: (a) the well known PW estimate; (b) the sparse density construction (SDC) algorithm [17]; (c) the sparse kernel density construction (SKD) algorithm [18]; (d) the reduced set density estimator with multiplicative nonnegative quadratic programming (RSDE-MNQP) [11]; (e) the FCR-MISE algorithm [19]; and (f) the RTR-MISE algorithm [14]. We briefly explain these six algorithms.
A new sparse kernel density estimator with tunable kernels is introduced within a forward constrained regression framework whereby the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Based on the minimum integrated square error criterion, a recursive algorithm is developed to select significant kernels one at time, and the kernel width of the selected kernel is then tuned using the gradient descent algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing very sparse kernel density estimators with competitive accuracy to existing kernel density estimators.
Estimating Probability Density Function of Vehicle-Related Time Series Data using Histogram Cubic B-Spline Approximation
2023, arXiv
Estimating the square root of probability density function on Riemannian manifold
2021, Expert Systems
Adaptive robust local online density estimation for streaming data
2021, International Journal of Machine Learning and Cybernetics
Profit: Detecting and Quantifying Side Channels in Networked Applications
2019, 26th Annual Network and Distributed System Security Symposium, NDSS 2019

View all citing articles on Scopus

She worked as a research assistant in Beijing Institute of Systems Engineering, Beijing, China from 1987 to 1993. She worked as a research fellow in the School of Electronics and Computer Science at the University of Southampton, UK, from 1997 to 2001. Since 2001, She has been with the School of Systems Engineering, the University of Reading, UK, where she currently holds a Readership post. She is actively engaged in research into nonlinear systems identification, data modelling, estimation and intelligent control, neural networks, pattern recognition, learning theory and their applications. She has published over 100 research papers, and coauthored a research book. She was awarded a Donald Julius Groen Prize by the Institution of Mechanical Engineers in 1999.

Sheng Chen received his B.Eng. degree from the East China Petroleum Institute, China, in January 1982, and his Ph.D. degree from the City University, London, in September 1986, both in control engineering. In 2005, he was awarded the higher doctorate degree, Doctor of Sciences (DSc), from the University of Southampton, Southampton, UK.

From 1986 to 1999, He held research and academic appointments at the Universities of Sheffield, Edinburgh and Portsmouth, all in UK. Since 1999, he has been with Electronics and Computer Science, the University of Southampton, UK, where he currently holds the post of Professor in Intelligent Systems and Signal Processing. His research interests include adaptive signal processing, wireless communications, modelling and identification of nonlinear systems, neural network and machine learning, intelligent control system design, evolutionary computation methods and optimisation. He has published over 460 research papers.

He is a Fellow of IEEE and a Fellow of IET. He is a Distinguished Adjunct Professor at the King Abdulaziz University, Jeddah, Saudi Arabia. He is an ISI highly cited researcher in the engineering category (March 2004).

Abdulrohman Qatawneh received his B.S. degree from the University of Jordan in 1994 and his M.S. degree from the Jordan University of Science and Technology in 1997, both in electrical engineering. He received his Ph.D. degree in telecommunication engineering from the Polytechnic University of Madrid in 2006.

He is currently an assistant professor at the King Abdulaziz University, Saudi Arabia. His research interests include mobile communications systems, differential MIMO coding and computational intelligence algorithms.

Khaled Daqrouq received his B.S. and M.S. degrees in biomedical engineering from the Wroclaw University of Technology in Poland, in 1995, as one certificate, and his Ph.D. degree in electronics engineering from the Wroclaw University of Technology, Poland, in 2001.

He is currently an associate professor at the King Abdulaziz University, Saudi Arabia. His research interests are in ECG signal processing, wavelet transform applications for speech recognition, as well as in the general area of speech and audio signal processing and improving auditory prostheses in noisy environments.

Muntasir Sheikh received the B.S. degree in electronics and communication engineering from the King Abdulaziz University, Saudi Arabia, in 1987, the M.Sc. degree in RF communications engineering from the University of Bradford, UK, in 1991, and the Ph.D. degree in electrical engineering from the University of Arizona, USA, in 1999.

Since 1999, he has been with the King Abdulaziz University, Saudi Arabia, where currently he is an assistant professor. His research interests are in remotely monitoring for security applications, and robotics.

Ali Morfeq received the B.S. degree in computer engineering from the King Abdulaziz University, Saudi Arabia, in 1982, the M.S. degree in computer engineering from the Oregon state University, Corvallis, USA in 1985, and the Ph.D. degree in computer science from the University of Colorado, Boulder, USA in 1990.

Since 1990, he has been with the King Abdulaziz University, Saudi Arabia, where currently he is an assistant professor and the chair of the Electrical & Computer Engineering Department, Faculty of Engineering. His research interests are in software engineering, and software systems for hospital applications.

View full text

Sparse probability density function estimation using the minimum integrated square error

Abstract

Introduction

Section snippets

Sparse kernel density estimator construction via forward constrained regression

Joint kernel selection and weight estimation based on the MISE

Simulation study

Conclusions

Acknowledgements

Neural Networks

Neurocomputing

Neural Networks

Neurocomputing

Finite Mixture Models

Density Estimation for Statistics and Data Analysis

Pattern Classification and Scene Analysis

Neural Networks for Pattern Recognition

Robust control of the output probability density functions for multivariable stochastic systems with guaranteed stability

IEEE Trans. Autom. Control

Adaptive minimum-BER linear multiuser detection for DS-CDMA signals in multipath channels

IEEE Trans. Signal Process.

Particle swarm optimization aided orthogonal forward regression for unified data modelling

IEEE Trans. Evol. Comput.

On estimation of a probability density function and mode

Ann. Math. Stat.

Maximum likelihood from incomplete data via the EM algorithm

J. R. Stat. Soc. B

An Introduction to Bootstrap