Stochastics and Statistics
Fusion of hard and soft information in nonparametric density estimation

https://doi.org/10.1016/j.ejor.2015.06.034Get rights and content

Highlights

  • We address how to fuse hard and soft information for probability density estimation.

  • We formulate the problem as a stochastic optimization model.

  • We examine convexity, consistency, and asymptotics.

  • We illustrate the approach in a variety of settings.

Abstract

This paper discusses univariate density estimation in situations when the sample (hard information) is supplemented by “soft” information about the random phenomenon. These situations arise broadly in operations research and management science where practical and computational reasons severely limit the sample size, but problem structure and past experiences could be brought in. In particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum likelihood estimator that incorporates any, possibly random, soft information through an arbitrary collection of constraints. We illustrate the breadth of possibilities by discussing soft information about shape, support, continuity, smoothness, slope, location of modes, symmetry, density values, neighborhood of known density, moments, and distribution functions. The maximization takes place over spaces of extended real-valued semicontinuous functions and therefore allows us to consider essentially any conceivable density as well as convenient exponential transformations. The infinite dimensionality of the optimization problem is overcome by approximating splines tailored to these spaces. To facilitate the treatment of small samples, the construction of these splines is decoupled from the sample. We discuss existence and uniqueness of the estimator, examine consistency under increasing hard and soft information, and give rates of convergence. Numerical examples illustrate the value of soft information, the ability to generate a family of diverse densities, and the effect of misspecification of soft information.

Introduction

It is recognized that statistical estimates can be improved greatly by including contextual information to supplement the information derived from data. We refer to the contextual information as soft information, in contrast to hard information derived from observations (data). In this paper, we consider univariate probability density estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample size, not surprisingly, it is when the sample size is small that this fusion of hard and soft information plays a crucial role in producing quality estimates. We limit the scope to densities of random variables with distributions that are absolutely continuous with respect to the Lebesgue measure on a bounded interval.

The need for estimating probability density functions is prevalent across operations research and management science. For example, an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for example Barton, Nelson, and Xie (2010); Chick (2001); Freimer and Schruben (2002). Density estimation is also needed when populating probability models and when analyzing simulation output beyond their typical first and second moments. In all these situations, however, the sample available is typically extremely small due to practical and computational limitations. One is usually forced to restrict the attention to parametric families of densities. In this paper, we provide the theoretical foundations of an alternative approach that brings in soft information about problem structure and past experiences to obtain reasonable nonparametric density estimates even for very small sample sizes. The approach has been successfully applied in the context of simulation output analysis Singham, Royset, and Wets (2013), uncertainty quantification Royset, Sukumar, and Wets (2013), as well as estimation of errors in forecasts for commodity prices Wets and Rios (Under review) and electricity demand Feng, Gade, Ryan, Watson, Wets, and Woodruff (2013); see also Rios, Wets, and Woodruff (Under review).

A natural and widely studied approach to density estimation is to adopt an M-estimator with additional constraints to account for soft information. We continue this tradition by defining an estimator that is an optimal solution of a constrained maximum likelihood problem. An appealing property of such estimators is that for any sample size, an estimate is the best possible within the class of allowable functions according to the given criterion (likelihood).

We trace the consideration of soft information in terms of shape constraints at least back to Grenander (1956a), 1956b). More recent studies of univariate log-concave densities include Balabdaoui, Rufiback, and Wellner (2009); Dumbgen and Rufibach (2009); Groenenboom and Wellner (1992); Jongbloed (1998); Pal, Woodroofe, and Meyer (2007); Walther (2002), with computational comparisons in Rufiback (2007); see also the review Walther (2009) and, in the case of multivariate densities, e.g., Cule, Samworth, and Stewart (2010a), 2010b). Convexity and monotonicity restrictions are examined in Groenenboom, Jongbloed, and Wellner (2001); Meyer (2012b) and monotonicity, monotonicity and convexity, U-shape, as well as unimodality with known mode are studied in Meyer (2012b); Meyer and Habtzghib (2011). Unimodal functions are also covered in Hall and Kang (2005); Reboul (2005), with the former covering U-shape as well. Monotone, convex, and log-concave densities are dealt with in Birke (2009). Studies of k-monotone densities include Balabdaoui and Wellner (2007), 2010); Gao and Wellner (2009). Densities given as monotone transformations of convex functions are examined in Seregin and Wellner (2010). Convex formulation of a collection of shape restrictions is discussed in Papp (2011); Papp and Alizadeh (2014). We refer to the recent dissertation Doss (2013) and the discussion in Cule, Samworth, and Stewart (2010b) for a more comprehensive review and to Lim and Glynn (2012) for the related context of shape-restricted regression.

Although these studies address important cases, there is no overarching framework that allows for a comprehensive description of soft information formulated by a large variety of constraints. Initial work in this direction is found in Wang (1996), which deals with parametric nonlinear least-squares regression subject to a finite number of smooth equality and inequality constraints. That paper examines the asymptotics of the least-squares estimator using the convergence theory of constrained optimization, specifically epi-convergence. In the context of constrained maximum likelihood estimation, Dong and Wets (2007) establishes consistency of an estimator through a functional law of large numbers and epi-convergence. The latter work is an immediate forerunner to the present paper.

Having adopted a nonparametric constrained maximum likelihood framework, we face technical challenges along two axes. First, one needs to deal with constrained optimization problems. Of course, in principle, constraints can be handled through penalties and regularizations; see for example Good and Gaskin (1971); Klonias (1982); Leonard (1978); de Montricher, Tapia, and Thompson (1975); Silverman (1982); Thompson and Tapia (1990) and more recently Bühlmann and van de Geer (2011); Eggermont and LaRiccia (2001); Koenker and Mizera (2006), 2008), 2010); Meyer (2012a); Turlach (2005). However, the equivalence and interpretations of such reformulations depends on the successful selection of multipliers and penalty parameters which is far from trivial in practice, especially in the case of multiple constraints. In fact, poor selection of these multipliers and parameters may cause computational challenges due to ill-conditioning of the resulting optimization problem as well as significant deterioration of the quality of the resulting density estimate. Moreover, it becomes unclear in what sense, if any, an estimator is “best” when an otherwise natural criterion such as likelihood is mixed with nonzero penalty terms; see Dong and Wets (2007) for further discussion. It is also possible to devise specialized algorithms such as the iterative convex minorant algorithm Groenenboom and Wellner (1992); Jongbloed (1998) to account for certain constraints or modify “unconstrained” estimators such as those based on kernels; Hall and Kang (2005) handles unimodality, Birke (2009) considers monotonicity, convexity, and log-concavity, and Davies and Kovac (2004) aims to reduce the number of modes; see Racine (2015); Wolters (2012) for computational tools. Again, it is unclear in what sense, if any, such estimates are “best” in the case of finite samples. Moreover, it is challenging to generalize these approaches to handle other types of soft information. We direct the reader to Tsybakov (2009) and references therein for treatments of kernel estimators including a discussion of optimality.

The second challenge with a nonparametric constrained maximum likelihood framework is the infinite-dimensionality of the resulting optimization problem. Naturally, there is a computational need to consider families of approximating densities characterized by a finite number of parameters. The method of sieves Chen (2007); Geman and Hwang (1982); Grenander (1981) provides a framework for constructing, typically, finite-dimensional approximating subsets that are gradually refined as the sample size grows and that in the limit is dense in a function space of interest. However, difficulties arise from three directions. First, with our focus on small sample sizes, the linkage between sample size and sieves becomes untenable. Second, in order to allow for the possibility of discontinuous densities and exponential transformations, we choose as underlying space the extended real-valued lower or upper semicontinous functions, but neither is a linear space. Consequently, the mathematically inbred tendency to obtain a finite-dimensional approximation by relying on a well-chosen finite basis is problematic; see for example Delecroix and Thomas-Agnan (2000); Meyer (2012a) for such an approach based on splines. Third, despite progress towards handling shape restrictions on sieves (see for example Dechevsky and Penev (1997); DeVore (1977a), 1977b); Papp (2011); Papp and Alizadeh (2014)), there is no straightforward way of handling a comprehensive set of soft information.

In this paper, as in Dong and Wets (2007), we consider an arbitrarily constrained maximum likelihood estimator for densities. We appear to be the first to consider such general constraints (soft information) in the context of nonparametric density estimation. The soft information might even be random, i.e., the soft information may not be known a priori but is realized with the sample. We give concrete formulations of the constrained maximum likelihood problem in the case of soft information about support bounds, semicontinuity, continuity, smoothness, slope information and related quantities, monotonicity, log-concavity, unimodality, location of modes, symmetry, bounds on density values, neighborhood of known density, bounds on moments, and bounds on cumulative distribution functions. We allow for any combination of these, and essentially any other constraint too.

We overcome the technical difficulty caused by constraints through the theory of constrained optimization, specifically epi-convergence, and therefore avoid tuning parameters related to penalties and regularization. With the exception of the preliminary work Dong and Wets (2007), this paper is the first to utilize epi-convergence to analyze constrained density estimators. We overcome the difficulty of infinite dimensionality through the use of a new class of splines, epi-splines Royset and Wets (2014), which are highly flexible, allow for discontinuities, and enable convenient exponential transformations. Here, for the first time, the theoretical foundations for using epi-splines in density estimation are laid out. In contrast to sieves, epi-splines can be constructed independently of the sample and therefore handles small sample sizes naturally. The precursor Dong and Wets (2007) relies on a finite approximation of L2 by Fourier coefficients. In this paper, we consider the spaces of extended real-valued semicontinuous functions, exponential transformations, and epi-spline approximations.

The reliance on epi-convergence and epi-splines allow us to view the constrained maximum likelihood problem as an approximation of a limiting optimization problem involving the actual probability density, correct soft information, and the full space of semicontinuous functions; we refer Pflug and Wets (2013) for a related study in the context of regression utilizing graphical convergence. Consequently, we not only approximate a certain function space or deal with finite sample size, but study the approximation of the whole estimation process as formulated by the limiting optimization problem. The approach facilitates the examination of families of estimators such as those that are near-optimal solutions of a constrained maximum likelihood problem.

Our primary motivation is to obtain reasonable estimates in situations with little hard information and we provide a consistency result as soft information is refined, quantify finite sample errors, and present a small computational study to motivate the estimator in that regard. Still, we also establish consistency and quantify asymptotic rates, as hard information is refined, under general constraints.

We focus exclusively on univariate densities that vanish beyond a compact interval of the real line. Although most of the results extend to the unbounded case and higher dimensions, technical issues will then become prominent and obscure the treatment of arbitrary random constraints and the supporting epi-spline approximations. Moreover, with a small sample, tail behavior can only come in via soft information, which is easily handled by our framework but omitted here for simplicity; a few experimental results can be found in Sood and Wets (2011).

The paper proceeds in Section 2 by defining the constrained maximum likelihood estimator, summarizing the underlying approximation theory, which is based on Royset and Wets (2014), and discussing existence and uniqueness. Section 3 exemplify the breadth of soft information that can be included and Section 4 provides consistency, asymptotics, and finite sample error results. A small collection of numerical examples are featured in Section 5. The paper is summarized in Section 6.

Section snippets

Exponential Epi-Spline estimator

This section formulates a constrained maximum likelihood problem and presents a finite-dimensional approximation. We discuss existence, uniqueness, and computations. The section also includes the prerequisite approximation results.

Soft information

We implement soft information about the density under consideration in the estimation problem (P˜p,mn) through the set Rn, which can be any, possibly random, subset of IR(p+2)N+1. It is observed empirically and also illustrated in Section 5 that soft information tends to improve density estimates. In this section, we give a soft consistency theorem that, in part, explains these observations. We also give examples of constraints for specific instances of soft information. We start, however, with

Consistency, asymptotics, and error bounds

Being concerned, from now on, with asymptotics, we again view (Pp,mn) to be a random optimization problem, i.e., (Pp,mn):minsSn1ni=1ns(Xi)suchthatm0mNes(x)dx=1;whose random elements are the variables X1,,Xn and the random set Sn; we still designate a solution by sn which is now, itself, a random epi-spline. To achieve consistency, derive asymptotics and other results, we view {(Pp,mn)}n=1, for given m and p, as a sequence of random optimization problems that under quite general

Numerical examples

We illustrate the exponential epi-spline estimator through a series of examples using a freely available Matlab toolbox Royset and Wets (2013) that relies on the fmincon solver (Matlab 7.10.0); see also Buttrey, Royset, and Wets (2014) for a corresponding R toolbox. The focus is on showing the effect of including various sources of soft information in the context of small sample sizes. Section 5.1 shows estimates of an exponential density using 10 observation and an increasing collection of

Conclusions

We have developed a constrained maximum likelihood estimator that incorporates any soft information that might be available and therefore offers substantial flexibility for practitioners. In particular in situations with few (hard) observations, soft information can be brought in and reasonable estimates can be achieved with as little as 10 sample points. In simple but illustrative examples of estimating exponential, normal, and mixture of exponential distributions, we construct new estimates

Acknowledgments

This material is based upon work supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant numbers 00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273. The authors thank the referees for insightful comments, Drs. R. Sood and D. Singham for carrying out a part of the numerical tests, and Prof. N. Sukumar for invigorating discussions.

References (75)

  • BirkeM.

    Shape constrained kernel density estimation

    Journal of Statistical Planning and Inference

    (2009)
  • ChenX.

    Large sample sieve estimation of semi-nonparametric models

    Handbook of econometric

    (2007)
  • AttouchH. et al.

    The topology of the ρ-Hausdorff distance

    Annali di Matematica pura ed applicata

    (1991)
  • BalabdaouiF. et al.

    Limit distribution theory for maximum likelihood estimation of a log-concave density

    Annals of Statistics

    (2009)
  • BalabdaouiF. et al.

    Estimation of a k-monotone density: limit distribution theory and the spline connection

    Annals of Statistics

    (2007)
  • BalabdaouiF. et al.

    Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds

    Statistica Neerlandica

    (2010)
  • BartonR.R. et al.

    A framework for input uncertainty analysis

    Proceedings of the 2002 winter simulation conference

    (2010)
  • BühlmannP. et al.

    Statistics for High-Dimensional Data, Methods, Theory and Applications

    (2011)
  • Buttrey, S., Royset, J. O. & Wets, R. (2014). XSPL estimator: An R toolbox,...
  • CarrollR.J. et al.

    Testing and estimating shape-constrained nonparametric density and regression in the presence of measurement error

    Journal of the American Statistical Association

    (2011)
  • ChickS.E.

    Input distribution selection for simulation experiments: Accounting for input uncertainty

    Operations Research

    (2001)
  • CuleM. et al.

    Maximum likelihood estimation of a multi-dimensional log-concave density

    Journal of the Royal Statistical Society Series B

    (2010)
  • CuleM. et al.

    Rejoinder to maximum likelihood estimation of a multi-dimensional log-concave density

    Journal of the Royal Statistical Society Series B

    (2010)
  • CuleM.L. et al.

    Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density

    Electronic J. Statistics

    (2010)
  • DaviesP. et al.

    Densities, spectral densities and modality

    The Annals of Statistics

    (2004)
  • DechevskyL. et al.

    On shape-preserving probabilistic wavelet approximators

    Stochastic Analysis and Applications

    (1997)
  • DelecroixM. et al.

    Spline and kernel regression under shape restrictions

  • DeVoreR.A.

    Monotone approximation by polynomials

    SIAM Journal on Mathematical Analysis

    (1977)
  • DeVoreR.A.

    Monotone approximation by splines

    SIAM Journal on Mathematical Analysis

    (1977)
  • DongM.X. et al.

    Estimating density functions: a constrained maximum likelihood approach

    Journal of Nonparametric Statistics

    (2007)
  • Doss, C. R. (2013). Shape-constrained inference for concave-transformed densities and their modes. Phd dissertation,...
  • DumbgenL. et al.

    Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency

    Bernoulli

    (2009)
  • DumbgenL. et al.

    Approximation by log-concave distributions with applications to regression

    Annals of Statistics

    (2011)
  • EggermontP.B. et al.

    Maximum penalized likelihood estimation, volume I: density estimation

    (2001)
  • FengY. et al.

    A new approximation method for generating day-ahead load scenarios

    2013 ieee power & energy society general meeting

    (2013)
  • FreimerM. et al.

    Collecting data and estimating parameters for input distributions

    Proceedings of the 2002 winter simulation conference

    (2002)
  • GaoF. et al.

    On the rate of convergence of the maximum likelihood estimator of a k-monotone density

    Science in China Series A: Mathematics

    (2009)
  • GemanS. et al.

    Nonparametric maximum likelihood estimation by the method of sieves

    The Annals of Statistics

    (1982)
  • GoodI.J. et al.

    Nonparametric roughness penalties for probability densities

    Biometrika

    (1971)
  • GrenanderU.

    On the theory of mortality measurement. I

    Skandinavisk Aktuarietidskrift

    (1956)
  • GrenanderU.

    On the theory of mortality measurement. II

    Skandinavisk Aktuarietidskrift

    (1956)
  • GrenanderU.

    Abstract inference

    (1981)
  • GroenenboomP. et al.

    Estimation of a convex function: characterizations and asymptotic theory

    Annals of Statistics

    (2001)
  • GroenenboomP. et al.

    Information bounds and nonparametric maximum likelihood estimation

    (1992)
  • HallP. et al.

    Unimodal kernel density estimation by data sharpening

    Statistica Sinica

    (2005)
  • JongbloedG.

    The iterative convex minorant algorithm for nonparametric estimation

    Journal of Computational and Graphical Statistics

    (1998)
  • King, A. J. Rockafellar, R. T. (1990). Asymptotic theory for solution of genaralized M-estimation and stochastic...
  • Cited by (22)

    • IETM centered intelligent maintenance system integrating fuzzy semantic inference and data fusion

      2017, Microelectronics Reliability
      Citation Excerpt :

      Belief function theory represents one of the most important tools for modeling and fusing multi-sensor pieces of evidence. It is a powerful mathematical mechanism to deal with imperfection and conflict and a flexible framework for representing and reasoning with various forms of imperfect information and knowledge [18]. The highest combined belief measure: Bel(i) is chosen as the final classification decision.

    • Handling Hard Affine SDP Shape Constraints in RKHSs

      2022, Journal of Machine Learning Research
    View all citing articles on Scopus
    View full text