Fusion of hard and soft information in nonparametric density estimation

doi:10.1016/j.ejor.2015.06.034

European Journal of Operational Research

Volume 247, Issue 2, 1 December 2015, Pages 532-547

https://doi.org/10.1016/j.ejor.2015.06.034 Get rights and content

Highlights

•
We address how to fuse hard and soft information for probability density estimation.
•
We formulate the problem as a stochastic optimization model.
•
We examine convexity, consistency, and asymptotics.
•
We illustrate the approach in a variety of settings.

Abstract

This paper discusses univariate density estimation in situations when the sample (hard information) is supplemented by “soft” information about the random phenomenon. These situations arise broadly in operations research and management science where practical and computational reasons severely limit the sample size, but problem structure and past experiences could be brought in. In particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum likelihood estimator that incorporates any, possibly random, soft information through an arbitrary collection of constraints. We illustrate the breadth of possibilities by discussing soft information about shape, support, continuity, smoothness, slope, location of modes, symmetry, density values, neighborhood of known density, moments, and distribution functions. The maximization takes place over spaces of extended real-valued semicontinuous functions and therefore allows us to consider essentially any conceivable density as well as convenient exponential transformations. The infinite dimensionality of the optimization problem is overcome by approximating splines tailored to these spaces. To facilitate the treatment of small samples, the construction of these splines is decoupled from the sample. We discuss existence and uniqueness of the estimator, examine consistency under increasing hard and soft information, and give rates of convergence. Numerical examples illustrate the value of soft information, the ability to generate a family of diverse densities, and the effect of misspecification of soft information.

Introduction

It is recognized that statistical estimates can be improved greatly by including contextual information to supplement the information derived from data. We refer to the contextual information as soft information, in contrast to hard information derived from observations (data). In this paper, we consider univariate probability density estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample size, not surprisingly, it is when the sample size is small that this fusion of hard and soft information plays a crucial role in producing quality estimates. We limit the scope to densities of random variables with distributions that are absolutely continuous with respect to the Lebesgue measure on a bounded interval.

The need for estimating probability density functions is prevalent across operations research and management science. For example, an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for example Barton, Nelson, and Xie (2010); Chick (2001); Freimer and Schruben (2002). Density estimation is also needed when populating probability models and when analyzing simulation output beyond their typical first and second moments. In all these situations, however, the sample available is typically extremely small due to practical and computational limitations. One is usually forced to restrict the attention to parametric families of densities. In this paper, we provide the theoretical foundations of an alternative approach that brings in soft information about problem structure and past experiences to obtain reasonable nonparametric density estimates even for very small sample sizes. The approach has been successfully applied in the context of simulation output analysis Singham, Royset, and Wets (2013), uncertainty quantification Royset, Sukumar, and Wets (2013), as well as estimation of errors in forecasts for commodity prices Wets and Rios (Under review) and electricity demand Feng, Gade, Ryan, Watson, Wets, and Woodruff (2013); see also Rios, Wets, and Woodruff (Under review).

A natural and widely studied approach to density estimation is to adopt an M-estimator with additional constraints to account for soft information. We continue this tradition by defining an estimator that is an optimal solution of a constrained maximum likelihood problem. An appealing property of such estimators is that for any sample size, an estimate is the best possible within the class of allowable functions according to the given criterion (likelihood).

We trace the consideration of soft information in terms of shape constraints at least back to Grenander (1956a), 1956b). More recent studies of univariate log-concave densities include Balabdaoui, Rufiback, and Wellner (2009); Dumbgen and Rufibach (2009); Groenenboom and Wellner (1992); Jongbloed (1998); Pal, Woodroofe, and Meyer (2007); Walther (2002), with computational comparisons in Rufiback (2007); see also the review Walther (2009) and, in the case of multivariate densities, e.g., Cule, Samworth, and Stewart (2010a), 2010b). Convexity and monotonicity restrictions are examined in Groenenboom, Jongbloed, and Wellner (2001); Meyer (2012b) and monotonicity, monotonicity and convexity, U-shape, as well as unimodality with known mode are studied in Meyer (2012b); Meyer and Habtzghib (2011). Unimodal functions are also covered in Hall and Kang (2005); Reboul (2005), with the former covering U-shape as well. Monotone, convex, and log-concave densities are dealt with in Birke (2009). Studies of k-monotone densities include Balabdaoui and Wellner (2007), 2010); Gao and Wellner (2009). Densities given as monotone transformations of convex functions are examined in Seregin and Wellner (2010). Convex formulation of a collection of shape restrictions is discussed in Papp (2011); Papp and Alizadeh (2014). We refer to the recent dissertation Doss (2013) and the discussion in Cule, Samworth, and Stewart (2010b) for a more comprehensive review and to Lim and Glynn (2012) for the related context of shape-restricted regression.

Although these studies address important cases, there is no overarching framework that allows for a comprehensive description of soft information formulated by a large variety of constraints. Initial work in this direction is found in Wang (1996), which deals with parametric nonlinear least-squares regression subject to a finite number of smooth equality and inequality constraints. That paper examines the asymptotics of the least-squares estimator using the convergence theory of constrained optimization, specifically epi-convergence. In the context of constrained maximum likelihood estimation, Dong and Wets (2007) establishes consistency of an estimator through a functional law of large numbers and epi-convergence. The latter work is an immediate forerunner to the present paper.

Having adopted a nonparametric constrained maximum likelihood framework, we face technical challenges along two axes. First, one needs to deal with constrained optimization problems. Of course, in principle, constraints can be handled through penalties and regularizations; see for example Good and Gaskin (1971); Klonias (1982); Leonard (1978); de Montricher, Tapia, and Thompson (1975); Silverman (1982); Thompson and Tapia (1990) and more recently Bühlmann and van de Geer (2011); Eggermont and LaRiccia (2001); Koenker and Mizera (2006), 2008), 2010); Meyer (2012a); Turlach (2005). However, the equivalence and interpretations of such reformulations depends on the successful selection of multipliers and penalty parameters which is far from trivial in practice, especially in the case of multiple constraints. In fact, poor selection of these multipliers and parameters may cause computational challenges due to ill-conditioning of the resulting optimization problem as well as significant deterioration of the quality of the resulting density estimate. Moreover, it becomes unclear in what sense, if any, an estimator is “best” when an otherwise natural criterion such as likelihood is mixed with nonzero penalty terms; see Dong and Wets (2007) for further discussion. It is also possible to devise specialized algorithms such as the iterative convex minorant algorithm Groenenboom and Wellner (1992); Jongbloed (1998) to account for certain constraints or modify “unconstrained” estimators such as those based on kernels; Hall and Kang (2005) handles unimodality, Birke (2009) considers monotonicity, convexity, and log-concavity, and Davies and Kovac (2004) aims to reduce the number of modes; see Racine (2015); Wolters (2012) for computational tools. Again, it is unclear in what sense, if any, such estimates are “best” in the case of finite samples. Moreover, it is challenging to generalize these approaches to handle other types of soft information. We direct the reader to Tsybakov (2009) and references therein for treatments of kernel estimators including a discussion of optimality.

The second challenge with a nonparametric constrained maximum likelihood framework is the infinite-dimensionality of the resulting optimization problem. Naturally, there is a computational need to consider families of approximating densities characterized by a finite number of parameters. The method of sieves Chen (2007); Geman and Hwang (1982); Grenander (1981) provides a framework for constructing, typically, finite-dimensional approximating subsets that are gradually refined as the sample size grows and that in the limit is dense in a function space of interest. However, difficulties arise from three directions. First, with our focus on small sample sizes, the linkage between sample size and sieves becomes untenable. Second, in order to allow for the possibility of discontinuous densities and exponential transformations, we choose as underlying space the extended real-valued lower or upper semicontinous functions, but neither is a linear space. Consequently, the mathematically inbred tendency to obtain a finite-dimensional approximation by relying on a well-chosen finite basis is problematic; see for example Delecroix and Thomas-Agnan (2000); Meyer (2012a) for such an approach based on splines. Third, despite progress towards handling shape restrictions on sieves (see for example Dechevsky and Penev (1997); DeVore (1977a), 1977b); Papp (2011); Papp and Alizadeh (2014)), there is no straightforward way of handling a comprehensive set of soft information.

In this paper, as in Dong and Wets (2007), we consider an arbitrarily constrained maximum likelihood estimator for densities. We appear to be the first to consider such general constraints (soft information) in the context of nonparametric density estimation. The soft information might even be random, i.e., the soft information may not be known a priori but is realized with the sample. We give concrete formulations of the constrained maximum likelihood problem in the case of soft information about support bounds, semicontinuity, continuity, smoothness, slope information and related quantities, monotonicity, log-concavity, unimodality, location of modes, symmetry, bounds on density values, neighborhood of known density, bounds on moments, and bounds on cumulative distribution functions. We allow for any combination of these, and essentially any other constraint too.

We overcome the technical difficulty caused by constraints through the theory of constrained optimization, specifically epi-convergence, and therefore avoid tuning parameters related to penalties and regularization. With the exception of the preliminary work Dong and Wets (2007), this paper is the first to utilize epi-convergence to analyze constrained density estimators. We overcome the difficulty of infinite dimensionality through the use of a new class of splines, epi-splines Royset and Wets (2014), which are highly flexible, allow for discontinuities, and enable convenient exponential transformations. Here, for the first time, the theoretical foundations for using epi-splines in density estimation are laid out. In contrast to sieves, epi-splines can be constructed independently of the sample and therefore handles small sample sizes naturally. The precursor Dong and Wets (2007) relies on a finite approximation of $L^{2}$ by Fourier coefficients. In this paper, we consider the spaces of extended real-valued semicontinuous functions, exponential transformations, and epi-spline approximations.

The reliance on epi-convergence and epi-splines allow us to view the constrained maximum likelihood problem as an approximation of a limiting optimization problem involving the actual probability density, correct soft information, and the full space of semicontinuous functions; we refer Pflug and Wets (2013) for a related study in the context of regression utilizing graphical convergence. Consequently, we not only approximate a certain function space or deal with finite sample size, but study the approximation of the whole estimation process as formulated by the limiting optimization problem. The approach facilitates the examination of families of estimators such as those that are near-optimal solutions of a constrained maximum likelihood problem.

Our primary motivation is to obtain reasonable estimates in situations with little hard information and we provide a consistency result as soft information is refined, quantify finite sample errors, and present a small computational study to motivate the estimator in that regard. Still, we also establish consistency and quantify asymptotic rates, as hard information is refined, under general constraints.

We focus exclusively on univariate densities that vanish beyond a compact interval of the real line. Although most of the results extend to the unbounded case and higher dimensions, technical issues will then become prominent and obscure the treatment of arbitrary random constraints and the supporting epi-spline approximations. Moreover, with a small sample, tail behavior can only come in via soft information, which is easily handled by our framework but omitted here for simplicity; a few experimental results can be found in Sood and Wets (2011).

The paper proceeds in Section 2 by defining the constrained maximum likelihood estimator, summarizing the underlying approximation theory, which is based on Royset and Wets (2014), and discussing existence and uniqueness. Section 3 exemplify the breadth of soft information that can be included and Section 4 provides consistency, asymptotics, and finite sample error results. A small collection of numerical examples are featured in Section 5. The paper is summarized in Section 6.

Section snippets

Exponential Epi-Spline estimator

This section formulates a constrained maximum likelihood problem and presents a finite-dimensional approximation. We discuss existence, uniqueness, and computations. The section also includes the prerequisite approximation results.

Soft information

We implement soft information about the density under consideration in the estimation problem $({\tilde{P}}_{p, m}^{n})$ through the set Rⁿ, which can be any, possibly random, subset of ${I R}^{(p + 2) N + 1}$ . It is observed empirically and also illustrated in Section 5 that soft information tends to improve density estimates. In this section, we give a soft consistency theorem that, in part, explains these observations. We also give examples of constraints for specific instances of soft information. We start, however, with

Consistency, asymptotics, and error bounds

Being concerned, from now on, with asymptotics, we again view $(P_{p, m}^{n})$ to be a random optimization problem, i.e., $\begin{matrix} (P_{p, m}^{n}) : \min_{s \in S^{n}} \frac{1}{n} \sum_{i = 1}^{n} s (X^{i}) such that \int_{m_{0}}^{m_{N}} e^{- s (x)} d x = 1; \end{matrix}$ whose random elements are the variables $X^{1}, \dots, X^{n}$ and the random set Sⁿ; we still designate a solution by sⁿ which is now, itself, a random epi-spline. To achieve consistency, derive asymptotics and other results, we view ${(P_{p, m}^{n})}_{n = 1}^{\infty},$ for given m and p, as a sequence of random optimization problems that under quite general

Numerical examples

We illustrate the exponential epi-spline estimator through a series of examples using a freely available Matlab toolbox Royset and Wets (2013) that relies on the fmincon solver (Matlab 7.10.0); see also Buttrey, Royset, and Wets (2014) for a corresponding R toolbox. The focus is on showing the effect of including various sources of soft information in the context of small sample sizes. Section 5.1 shows estimates of an exponential density using 10 observation and an increasing collection of

Conclusions

We have developed a constrained maximum likelihood estimator that incorporates any soft information that might be available and therefore offers substantial flexibility for practitioners. In particular in situations with few (hard) observations, soft information can be brought in and reasonable estimates can be achieved with as little as 10 sample points. In simple but illustrative examples of estimating exponential, normal, and mixture of exponential distributions, we construct new estimates

Acknowledgments

This material is based upon work supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant numbers 00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273. The authors thank the referees for insightful comments, Drs. R. Sood and D. Singham for carrying out a part of the numerical tests, and Prof. N. Sukumar for invigorating discussions.

References (75)

BirkeM.
Shape constrained kernel density estimation
Journal of Statistical Planning and Inference
(2009)
ChenX.
Large sample sieve estimation of semi-nonparametric models
Handbook of econometric
(2007)
AttouchH. et al.
The topology of the ρ-Hausdorff distance
Annali di Matematica pura ed applicata
(1991)
BalabdaouiF. et al.
Limit distribution theory for maximum likelihood estimation of a log-concave density
Annals of Statistics
(2009)
BalabdaouiF. et al.
Estimation of a k-monotone density: limit distribution theory and the spline connection
Annals of Statistics
(2007)
BalabdaouiF. et al.
Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds
Statistica Neerlandica
(2010)
BartonR.R. et al.
A framework for input uncertainty analysis
Proceedings of the 2002 winter simulation conference
(2010)
BühlmannP. et al.
Statistics for High-Dimensional Data, Methods, Theory and Applications
(2011)
Buttrey, S., Royset, J. O. & Wets, R. (2014). XSPL estimator: An R toolbox,...
CarrollR.J. et al.
Testing and estimating shape-constrained nonparametric density and regression in the presence of measurement error
Journal of the American Statistical Association
(2011)

ChickS.E.

Input distribution selection for simulation experiments: Accounting for input uncertainty

Operations Research

(2001)

CuleM. et al.

Maximum likelihood estimation of a multi-dimensional log-concave density

Journal of the Royal Statistical Society Series B

(2010)

CuleM. et al.

Rejoinder to maximum likelihood estimation of a multi-dimensional log-concave density

Journal of the Royal Statistical Society Series B

(2010)

CuleM.L. et al.

Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density

Electronic J. Statistics

(2010)

DaviesP. et al.

Densities, spectral densities and modality

The Annals of Statistics

(2004)

DechevskyL. et al.

On shape-preserving probabilistic wavelet approximators

Stochastic Analysis and Applications

(1997)

DelecroixM. et al.

Spline and kernel regression under shape restrictions

DeVoreR.A.

Monotone approximation by polynomials

SIAM Journal on Mathematical Analysis

(1977)

DeVoreR.A.

Monotone approximation by splines

SIAM Journal on Mathematical Analysis

(1977)

DongM.X. et al.

Estimating density functions: a constrained maximum likelihood approach

Journal of Nonparametric Statistics

(2007)

Doss, C. R. (2013). Shape-constrained inference for concave-transformed densities and their modes. Phd dissertation,...

DumbgenL. et al.

Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency

Bernoulli

(2009)

DumbgenL. et al.

Approximation by log-concave distributions with applications to regression

Annals of Statistics

(2011)

EggermontP.B. et al.

Maximum penalized likelihood estimation, volume I: density estimation

(2001)

FengY. et al.

A new approximation method for generating day-ahead load scenarios

2013 ieee power & energy society general meeting

(2013)

FreimerM. et al.

Collecting data and estimating parameters for input distributions

Proceedings of the 2002 winter simulation conference

(2002)

GaoF. et al.

On the rate of convergence of the maximum likelihood estimator of a k-monotone density

Science in China Series A: Mathematics

(2009)

GemanS. et al.

Nonparametric maximum likelihood estimation by the method of sieves

The Annals of Statistics

(1982)

GoodI.J. et al.

Nonparametric roughness penalties for probability densities

Biometrika

(1971)

GrenanderU.

On the theory of mortality measurement. I

Skandinavisk Aktuarietidskrift

(1956)

GrenanderU.

On the theory of mortality measurement. II

Skandinavisk Aktuarietidskrift

(1956)

GrenanderU.

Abstract inference

(1981)

GroenenboomP. et al.

Estimation of a convex function: characterizations and asymptotic theory

Annals of Statistics

(2001)

GroenenboomP. et al.

Information bounds and nonparametric maximum likelihood estimation

(1992)

HallP. et al.

Unimodal kernel density estimation by data sharpening

Statistica Sinica

(2005)

JongbloedG.

The iterative convex minorant algorithm for nonparametric estimation

Journal of Computational and Graphical Statistics

(1998)

King, A. J. Rockafellar, R. T. (1990). Asymptotic theory for solution of genaralized M-estimation and stochastic...

Cited by (22)

Constructing probabilistic scenarios for wide-area solar power generation
2018, Solar Energy
Optimizing thermal generation commitments and dispatch in the presence of high penetrations of renewable resources such as solar energy requires a characterization of their stochastic properties. In this paper, we describe novel methods designed to create day-ahead, wide-area probabilistic solar power scenarios based only on historical forecasts and associated observations of solar power production. Each scenario represents a possible trajectory for solar power in next-day operations with an associated probability computed by algorithms that use historical forecast errors. Scenarios are created by segmentation of historic data, fitting non-parametric error distributions using epi-splines, and then computing specific quantiles from these distributions. Additionally, we address the challenge of establishing an upper bound on solar power output. Our specific application driver is for use in stochastic variants of core power systems operations optimization problems, e.g., unit commitment and economic dispatch. These problems require as input a range of possible future realizations of renewables production. However, the utility of such probabilistic scenarios extends to other contexts, e.g., operator and trader situational awareness. We compare the performance of our approach to a recently proposed method based on quantile regression, and demonstrate that our method performs comparably to this approach in terms of two widely used methods for assessing the quality of probabilistic scenarios: the Energy score and the Variogram score.
IETM centered intelligent maintenance system integrating fuzzy semantic inference and data fusion
2017, Microelectronics Reliability
Citation Excerpt :
Belief function theory represents one of the most important tools for modeling and fusing multi-sensor pieces of evidence. It is a powerful mathematical mechanism to deal with imperfection and conflict and a flexible framework for representing and reasoning with various forms of imperfect information and knowledge [18]. The highest combined belief measure: Bel(i) is chosen as the final classification decision.
This paper presents a novel interactive electronic technical manual (IETM) centered intelligent maintenance system, which integrates diagnosis strategies of experience-based manual interpretation, rule-based fuzzy semantic inference and condition-based data fusion. Firstly, initial judgment is tried by onsite maintainer; otherwise rule-based fuzzy semantic inference is proposed on the designed IETM platform for rapid diagnosis using portable maintenance aid (PMA). For condition monitoring subsystems, signals can be collected and download to ground station via PMA for enhanced diagnosis using advanced classifiers and data fusion techniques. The combined diagnostic strategies are employed to trigger maintenance guidance and relevant works such as spare parts management etc. The proposed scheme was evaluated by two experiments of fault diagnosis for electric multiple units (EMU) trains. Experiment results show that intelligent, convenient, accurate and flexible diagnosis advantages can be obtained, which are benefit to maintenance reality.
Statistical Inference of Partially Linear Spatial Autoregressive Model Under Constraint Conditions
2023, Journal of Systems Science and Complexity
Existence and Statistical Estimation of Equilibria in Stochastic Electoral Competitions
2023, Journal of Convex Analysis
Handling Hard Affine SDP Shape Constraints in RKHSs
2022, Journal of Machine Learning Research
The generalized Pearson family of distributions and explicit representation of the associated density functions
2022, Communications in Statistics - Theory and Methods

View all citing articles on Scopus

View full text

Stochastics and StatisticsFusion of hard and soft information in nonparametric density estimation

Highlights

Abstract

Introduction

Section snippets

Exponential Epi-Spline estimator

Soft information

Consistency, asymptotics, and error bounds

Numerical examples

Conclusions

Acknowledgments

Journal of Statistical Planning and Inference

The topology of the ρ-Hausdorff distance

Annali di Matematica pura ed applicata

Limit distribution theory for maximum likelihood estimation of a log-concave density

Annals of Statistics

Estimation of a k-monotone density: limit distribution theory and the spline connection

Annals of Statistics

Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds

Statistica Neerlandica

A framework for input uncertainty analysis

Proceedings of the 2002 winter simulation conference

Statistics for High-Dimensional Data, Methods, Theory and Applications

Testing and estimating shape-constrained nonparametric density and regression in the presence of measurement error

Journal of the American Statistical Association

Input distribution selection for simulation experiments: Accounting for input uncertainty

Operations Research

Maximum likelihood estimation of a multi-dimensional log-concave density

Journal of the Royal Statistical Society Series B

Rejoinder to maximum likelihood estimation of a multi-dimensional log-concave density

Journal of the Royal Statistical Society Series B

Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density

Electronic J. Statistics

Densities, spectral densities and modality

The Annals of Statistics

On shape-preserving probabilistic wavelet approximators

Stochastic Analysis and Applications

Spline and kernel regression under shape restrictions

Monotone approximation by polynomials

SIAM Journal on Mathematical Analysis

Monotone approximation by splines

SIAM Journal on Mathematical Analysis

Estimating density functions: a constrained maximum likelihood approach

Journal of Nonparametric Statistics

Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency

Bernoulli

Approximation by log-concave distributions with applications to regression

Annals of Statistics

Maximum penalized likelihood estimation, volume I: density estimation

A new approximation method for generating day-ahead load scenarios

2013 ieee power & energy society general meeting

Collecting data and estimating parameters for input distributions

Proceedings of the 2002 winter simulation conference

On the rate of convergence of the maximum likelihood estimator of a k-monotone density

Science in China Series A: Mathematics

Nonparametric maximum likelihood estimation by the method of sieves

The Annals of Statistics

Nonparametric roughness penalties for probability densities

Biometrika

On the theory of mortality measurement. I

Skandinavisk Aktuarietidskrift

On the theory of mortality measurement. II

Skandinavisk Aktuarietidskrift

Abstract inference

Estimation of a convex function: characterizations and asymptotic theory

Annals of Statistics

Information bounds and nonparametric maximum likelihood estimation

Unimodal kernel density estimation by data sharpening

Statistica Sinica

The iterative convex minorant algorithm for nonparametric estimation

Journal of Computational and Graphical Statistics

Stochastics and Statistics
Fusion of hard and soft information in nonparametric density estimation