A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions

doi:10.1016/j.csda.2009.08.021

Computational Statistics & Data Analysis

Volume 54, Issue 12, 1 December 2010, Pages 3404-3410

https://doi.org/10.1016/j.csda.2009.08.021 Get rights and content

Abstract

Nonparametric conditional density functions are widely used in applied econometric and statistical modelling because they provide enriched information summaries of the relationships between dependent and independent variables. Although least-squares cross-validation is considered to be the best criterion for bandwidth selection of the kernel estimator of the conditional density, the number of computations required for this procedure grows exponentially as the number of observations increases. A fast algorithm is proposed to reduce this computational cost, and its accuracy and efficiency are verified via numerical experiments. A practical application is also presented to demonstrate the algorithm’s potential usefulness.

Introduction

Nonparametric conditional density functions have become popular in applied econometric and statistical modelling. They provide summarized information concerning the relationships between independent and dependent variables. Moreover, they are useful for data-driven modelling, such as nonparametric quintile regression, discrete choice modelling, and direct estimation of conditional probability density and distribution functions (see Racine, 2008, for a recent review). The kernel method is a commonly used nonparametric modelling approach, and many studies have examined nonparametric conditional kernel density functions (NP-CKDFs), since the pioneering work of Rosenblatt (1969).

Bandwidth selection is an important consideration for the relevant kernel estimator of the nonparametric conditional density, and it can be accomplished via several techniques. The most popular of these is the plug-in method (e.g. Li and Racine, 2007, Chapter 5), in which the optimal bandwidth is fairly easy to calculate. However, this technique employs a normal distribution to assign a value to the unknown constant in the optimal bandwidth (the bandwidth that minimizes the integrated mean square error). Consequently, the underlying densities are known a priori. Racine (2008) pointed out that the plug-in method tends to oversmooth the bandwidth and yields biased results for larger datasets. Bashtannyk and Hyndman (2001) compared several bandwidth selection strategies for NP-CKDFs that were previously introduced by Hyndman et al. (1996). They found the bootstrap method to be the best, although it requires a considerable amount of computation. Hall (1987) studied log-likelihood cross-validation methods. He pointed out that log-likelihood cross-validation also tends to oversmooth the bandwidth for larger datasets. Holmes et al. (2007) recently developed dual-tree-based algorithms for the bandwidth selection of NP-CKDFs. Although this technique dramatically reduces the computational cost, it is only applicable to log-likelihood cross-validation criteria.

Fan and Yim (2004) and Hall et al. (2004) investigated the least-squares cross-validation (LS-CV) method for selecting the bandwidth of NP-CKDFs. This criterion might be considered the best, in the sense that it minimizes a (weighted) integrated squared error. Fan and Yim (2004) compared the LS-CV method with other conventional techniques and obtained very favourable results concerning its performance. Hall et al. (2004) used the LS-CV method to detect irrelevant/relevant explanatory variables in nonparametric conditional densities. The need for implementation of this technique continues, and the method has already been included in the statistical software R (R Development Core Team, 2007) as a package named “np” (Hayfield and Racine, 2009).

In spite of its advantages, the practical application of LS-CV to NP-CKDFs has been impeded by its computational complexities. As will be seen in the next section, the evaluation of the objective function for the LS-CV criteria requires $O (n^{3})$ operations, which leads to enormous computational costs when the number of observations (denoted by $n$ ) increases. In this paper, we develop a fast algorithm for computing the LS-CV function for NP-CKDFs. We employ a Gaussian kernel function, which is the type most widely used in applications. In addition, we restrict our attention to NP-CKDFs with one independent variable. We remark that although there are other established approaches to computational cost reduction (for example Gray and Moore (2003) and Racine (2002)), ours differs from these by being based on expansion of the kernel function.

The outline of the paper is as follows. In Section 2, we clarify the computational difficulties in the LS-CV criterion and present the proposed algorithm. Section 3 describes numerical experiments designed to verify the accuracy and efficiency of the algorithm compared to the conventional method. In Section 4, the potential practicality of the algorithm is demonstrated via an application to the analysis of travel time variation in highway traffic, using an actual large dataset.

Section snippets

A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions

Let $(X_{i}, Y_{i}), i = 1, \dots, n$ be the iid sample of an independent and dependent variable pair. In the case of one independent variable, the LS-CV criterion for NP-CKDFs is intended to minimize the following objective function (Li and Racine, 2007, pp. 157–160): $C V_{f} (h_{x}, h_{y}) = \frac{1}{n} \sum_{i = 1}^{n} \frac{{\hat{G}}_{- i} (X_{i}, Y_{i})}{{{\hat{μ}}_{- i} (X_{i})}^{2}} - \frac{2}{n} \sum_{i = 1}^{n} \frac{{\hat{g}}_{- i} (X_{i}, Y_{i})}{{\hat{μ}}_{- i} (X_{i})},$ where ${\begin{cases} {\hat{μ}}_{- i} (X_{i}) = \frac{1}{n - 1} \sum_{j = 1, j \neq i}^{n} K_{h_{x}} (X_{i}, X_{j}), \\ {\hat{g}}_{- i} (X_{i}, Y_{i}) = \frac{1}{n - 1} \sum_{j = 1, j \neq i}^{n} w_{h_{y}} (Y_{i}, Y_{j}) K_{h_{x}} (X_{i}, X_{j}), \\ {\hat{G}}_{- i} (X_{i}, Y_{i}) = \frac{1}{{(n - 1)}^{2}} \sum_{j = 1, j \neq i}^{n} \sum_{l = 1, l \neq i}^{n} K_{h_{x}} (X_{i}, X_{j}) K_{h_{x}} (X_{i}, X_{l}) w w (Y_{j}, Y_{l}), \\ w w (Y_{j}, Y_{l}) = \int_{- \infty}^{\infty} w_{h_{y}} (y, Y_{j}) w_{h_{y}} \end{cases}$

Numerical experiment I

In this section, we verify the accuracy and efficiency of the proposed algorithm via numerical experiments using artificial datasets. For a given pair, $n$ and $n_{x}$ , an artificial dataset, denoted by $d a t a s e t_{(n, n_{x})}$ , can be created using Algorithm 2, as in Takeuchi et al. (2006). Datasets were generated for the following cases: $n = 625 \times 2^{i - 1}$ and $n_{x} = 500 \times 2^{j - 1}$ , where $i = 1, \dots, 12$ and $j = 1, 2, 3$ . Numerical experiments were carried out serially using one core of the processor of a personal computer (2.2 GHz,

Numerical experiment II

In this section, we verify the practicality of the proposed method by using it to analyze the relationship between time of day and travel time for actual traffic data. In traffic engineering applications, interest has grown in how to represent day-to-day and same-day variations in the travel time of traffic on urban roads (e.g. Hollander and Liu, 2008, for recent examples). In particular, it is important to be able to estimate the distribution of travel times (TT) on an urban road conditional

Concluding remarks

This paper presents a fast algorithm for computing LS-CV for NP-CKDF based on FGT and computational decomposition. Its accuracy and computational efficiency have been verified by numerical experiments. The proposed algorithm is about $3 \times 10^{11}$ times faster than is the conventional algorithm, even for small datasets. An application involving travel time underscores its potential practicality as well as the advantages of its appropriate use for large-scale datasets. Tune-up algorithms of this sort

Acknowledgement

We are grateful to Dr. Mogens Fosgerau for providing the travel time dataset.

References (20)

D. Bashtannyk et al.
Bandwidth selection for kernel conditional density estimation
Computational Statistics and Data Analysis
(2001)
Y. Hollander et al.
Estimation of the distribution of travel times by repeated simulation
Transportation Research Part C: Emerging Technologies
(2008)
J. Racine
Parallel distributed kernel estimation
Computational Statistics and Data Analysis
(2002)
A. Elgammal et al.
Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2003)
J. Fan et al.
A crossvalidation method for estimating conditional densities
Biometrika
(2004)
Fosgerau, M., Hjorth, K., Brems, C., Fukuda, D., 2008. Travel time variability: Definition and valuation. Tech. Rep.,...
Gray, A., Moore, A., 2003. Very fast multivariate kernel density estimation via computational geometry. In: Joint Stat....
L. Greengard et al.
The fast gauss transform
SIAM Journal on Scientific Computing
(1991)
P. Hall
On Kullback–Leibler loss and density estimation
Annals of Statistics
(1987)
P. Hall et al.
Cross-validation and the estimation of conditional probability densities
Journal of the American Statistical Association
(2004)

There are more references available in the full text version of this article.

Cited by (8)

Research on feature extraction algorithm of rolling bearing fatigue evolution stage based on acoustic emission
2018, Mechanical Systems and Signal Processing
Citation Excerpt :
The selection of the kernel parameter is crucial to the extraction result. Many methods have been adopted to optimize better kernel parameter [19–22], such as the methods of grid search, cross validation, genetic algorithm or particle swarm optimization (PSO). Among these methods, PSO is easier in operation, and needs less calculation compare with other methods.
This paper focuses on extracting effective evolution stage features of rolling bearing from the monitoring signal. Each feature has a different damage sensibility to different fatigue evolution stages. Fatigue evolution information is dispersed in different features, which increases the difficulty to recognize the fatigue stages. This paper presents a new feature extraction method for acoustic emission (AE) signal of rolling bearing to solve the problem, and a specially designed test rig is used for the experimental verification. The new method combines wavelet packet de-noising (WPD) with an improved kernel entropy component analysis (KECA). First, de-noising original signal by WPD method. Second, applying KECA method with Gaussian kernel function on the feature matrix extracted from the de-noised signal. A new particle swarm optimization method based on the best kernel entropy component number theory with inertia weight and dynamic accelerating constant (BWCPSO) is proposed to optimize the kernel parameter. BWCPSO method puts the minimized kernel entropy components number with the maximum stage information of rolling bearing as its objective. The optimal kernel parameter can make KECA method extract and converge the original signal information greatly. Finally, each fatigue evolution stage can be identified adaptively by the main kernel entropy score (KES) graphs. The experiment results show that the proposed method extracts the fatigue evolution stages information of rolling bearing effectively and much easier and more accuracy than the traditional feature trend analysis and other two traditional feature extraction methods.
An evolutionary modeling approach for designing a contractual REDD+ payment scheme
2017, Ecological Indicators
Citation Excerpt :
The marginal effects of all sample points are estimated first; the mean value for the marginal effects of all sample points is then calculated. Based on Ichimura and Fukuda (2010), optimal bandwidths are automatically selected by using least-square cross-validation (LSCV) estimates. According to the optimal bandwidths, Table 7 shows the average marginal effects of area on gdp in the five REDD+ countries.
Economic development and sustainable ecological and environmental development generally target different goals, and there are trade-offs and game-like behaviors between implementers and donors in reducing emissions from deforestation and degradation (REDD+) program. This paper analyzed the effects of evolutionary behaviors of REDD+ implementers and donors with respect to environmental service payments. Within a theoretical analysis framework, the evolutionary stable strategies of the implementers and donors under different contractual payments for ecosystem services (PES) scenarios are analyzed to explore contractual REDD+ payments through evolutionary game models. On this basis, anon-parametric local regression approach is used to analyze arrange of market-based contractual PES schemes in five REDD+ countries (Congo, Brazil, Ecuador, Nepal and Uganda). The results show the following: First, when the opportunity costs are less than the environmental benefits, some middle and high income developing countries are sufficiently incentivized to conserve forests, even without formal contractual PES scheme. This conservation occurs regardless of whether the industrialized countries will pay for the ecosystem services. Second, when the opportunity costs are greater than the environmental benefits, developing countries will generally not take the initiative to conserve forests when industrialized countries do not pay for the associated ecosystem services. Third, when there are enough contractual PES fines, developing countries will conserve forests, and the industrialized countries will pay for the ecosystem services. Finally, contractual PES fines are related to the costs of reducing emissions. Nepal has the largest marginal effect, and Uganda has the smallest. These results have implications for REDD+ program design and implementation: market-based contractual PES scheme should be given priority to regulate behavior of industrialized countries and developing countries in the PES scheme. Developing a targeted and differentiated contractual PES scheme requires an accounting of emission reduction costs in different countries.
Valuing travel time variability: Characteristics of the travel time distribution on an urban road
2012, Transportation Research Part C: Emerging Technologies
This paper provides a detailed empirical investigation of the distribution of travel times on an urban road for valuation of travel time variability. Our investigation is premised on the use of a theoretical model with a number of desirable properties. The definition of the value of travel time variability depends on certain properties of the distribution of random travel times that require empirical verification. Applying a range of nonparametric statistical techniques to data giving minute-by-minute travel times for a congested urban road over a period of five months, we show that the standardized travel time is roughly independent of the time of day as required by the theory. Except for the extreme right tail, a stable distribution seems to fit the data well. The travel time distributions on consecutive links seem to share a common stability parameter such that the travel time distribution for a sequence of links is also a stable distribution. The parameters of the travel time distribution for a sequence of links can then be derived analytically from the link level distributions.
3rd Special issue on matrix computations and statistics
2010, Computational Statistics and Data Analysis
Adaptive Greedy Algorithm for Moderately Large Dimensions in Kernel Conditional Density Estimation
2022, Journal of Machine Learning Research
Adaptive greedy algorithm for moderately large dimensions in kernel conditional density estimation
2021, arXiv

View all citing articles on Scopus

View full text

A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions

Abstract

Introduction

Section snippets

A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions

Numerical experiment I

Numerical experiment II

Concluding remarks

Acknowledgement

Computational Statistics and Data Analysis

Transportation Research Part C: Emerging Technologies

Computational Statistics and Data Analysis

Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

A crossvalidation method for estimating conditional densities

Biometrika

The fast gauss transform

SIAM Journal on Scientific Computing

On Kullback–Leibler loss and density estimation

Annals of Statistics

Cross-validation and the estimation of conditional probability densities

Journal of the American Statistical Association