Stochastics and StatisticsFusion of hard and soft information in nonparametric density estimation
Introduction
It is recognized that statistical estimates can be improved greatly by including contextual information to supplement the information derived from data. We refer to the contextual information as soft information, in contrast to hard information derived from observations (data). In this paper, we consider univariate probability density estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample size, not surprisingly, it is when the sample size is small that this fusion of hard and soft information plays a crucial role in producing quality estimates. We limit the scope to densities of random variables with distributions that are absolutely continuous with respect to the Lebesgue measure on a bounded interval.
The need for estimating probability density functions is prevalent across operations research and management science. For example, an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for example Barton, Nelson, and Xie (2010); Chick (2001); Freimer and Schruben (2002). Density estimation is also needed when populating probability models and when analyzing simulation output beyond their typical first and second moments. In all these situations, however, the sample available is typically extremely small due to practical and computational limitations. One is usually forced to restrict the attention to parametric families of densities. In this paper, we provide the theoretical foundations of an alternative approach that brings in soft information about problem structure and past experiences to obtain reasonable nonparametric density estimates even for very small sample sizes. The approach has been successfully applied in the context of simulation output analysis Singham, Royset, and Wets (2013), uncertainty quantification Royset, Sukumar, and Wets (2013), as well as estimation of errors in forecasts for commodity prices Wets and Rios (Under review) and electricity demand Feng, Gade, Ryan, Watson, Wets, and Woodruff (2013); see also Rios, Wets, and Woodruff (Under review).
A natural and widely studied approach to density estimation is to adopt an M-estimator with additional constraints to account for soft information. We continue this tradition by defining an estimator that is an optimal solution of a constrained maximum likelihood problem. An appealing property of such estimators is that for any sample size, an estimate is the best possible within the class of allowable functions according to the given criterion (likelihood).
We trace the consideration of soft information in terms of shape constraints at least back to Grenander (1956a), 1956b). More recent studies of univariate log-concave densities include Balabdaoui, Rufiback, and Wellner (2009); Dumbgen and Rufibach (2009); Groenenboom and Wellner (1992); Jongbloed (1998); Pal, Woodroofe, and Meyer (2007); Walther (2002), with computational comparisons in Rufiback (2007); see also the review Walther (2009) and, in the case of multivariate densities, e.g., Cule, Samworth, and Stewart (2010a), 2010b). Convexity and monotonicity restrictions are examined in Groenenboom, Jongbloed, and Wellner (2001); Meyer (2012b) and monotonicity, monotonicity and convexity, U-shape, as well as unimodality with known mode are studied in Meyer (2012b); Meyer and Habtzghib (2011). Unimodal functions are also covered in Hall and Kang (2005); Reboul (2005), with the former covering U-shape as well. Monotone, convex, and log-concave densities are dealt with in Birke (2009). Studies of k-monotone densities include Balabdaoui and Wellner (2007), 2010); Gao and Wellner (2009). Densities given as monotone transformations of convex functions are examined in Seregin and Wellner (2010). Convex formulation of a collection of shape restrictions is discussed in Papp (2011); Papp and Alizadeh (2014). We refer to the recent dissertation Doss (2013) and the discussion in Cule, Samworth, and Stewart (2010b) for a more comprehensive review and to Lim and Glynn (2012) for the related context of shape-restricted regression.
Although these studies address important cases, there is no overarching framework that allows for a comprehensive description of soft information formulated by a large variety of constraints. Initial work in this direction is found in Wang (1996), which deals with parametric nonlinear least-squares regression subject to a finite number of smooth equality and inequality constraints. That paper examines the asymptotics of the least-squares estimator using the convergence theory of constrained optimization, specifically epi-convergence. In the context of constrained maximum likelihood estimation, Dong and Wets (2007) establishes consistency of an estimator through a functional law of large numbers and epi-convergence. The latter work is an immediate forerunner to the present paper.
Having adopted a nonparametric constrained maximum likelihood framework, we face technical challenges along two axes. First, one needs to deal with constrained optimization problems. Of course, in principle, constraints can be handled through penalties and regularizations; see for example Good and Gaskin (1971); Klonias (1982); Leonard (1978); de Montricher, Tapia, and Thompson (1975); Silverman (1982); Thompson and Tapia (1990) and more recently Bühlmann and van de Geer (2011); Eggermont and LaRiccia (2001); Koenker and Mizera (2006), 2008), 2010); Meyer (2012a); Turlach (2005). However, the equivalence and interpretations of such reformulations depends on the successful selection of multipliers and penalty parameters which is far from trivial in practice, especially in the case of multiple constraints. In fact, poor selection of these multipliers and parameters may cause computational challenges due to ill-conditioning of the resulting optimization problem as well as significant deterioration of the quality of the resulting density estimate. Moreover, it becomes unclear in what sense, if any, an estimator is “best” when an otherwise natural criterion such as likelihood is mixed with nonzero penalty terms; see Dong and Wets (2007) for further discussion. It is also possible to devise specialized algorithms such as the iterative convex minorant algorithm Groenenboom and Wellner (1992); Jongbloed (1998) to account for certain constraints or modify “unconstrained” estimators such as those based on kernels; Hall and Kang (2005) handles unimodality, Birke (2009) considers monotonicity, convexity, and log-concavity, and Davies and Kovac (2004) aims to reduce the number of modes; see Racine (2015); Wolters (2012) for computational tools. Again, it is unclear in what sense, if any, such estimates are “best” in the case of finite samples. Moreover, it is challenging to generalize these approaches to handle other types of soft information. We direct the reader to Tsybakov (2009) and references therein for treatments of kernel estimators including a discussion of optimality.
The second challenge with a nonparametric constrained maximum likelihood framework is the infinite-dimensionality of the resulting optimization problem. Naturally, there is a computational need to consider families of approximating densities characterized by a finite number of parameters. The method of sieves Chen (2007); Geman and Hwang (1982); Grenander (1981) provides a framework for constructing, typically, finite-dimensional approximating subsets that are gradually refined as the sample size grows and that in the limit is dense in a function space of interest. However, difficulties arise from three directions. First, with our focus on small sample sizes, the linkage between sample size and sieves becomes untenable. Second, in order to allow for the possibility of discontinuous densities and exponential transformations, we choose as underlying space the extended real-valued lower or upper semicontinous functions, but neither is a linear space. Consequently, the mathematically inbred tendency to obtain a finite-dimensional approximation by relying on a well-chosen finite basis is problematic; see for example Delecroix and Thomas-Agnan (2000); Meyer (2012a) for such an approach based on splines. Third, despite progress towards handling shape restrictions on sieves (see for example Dechevsky and Penev (1997); DeVore (1977a), 1977b); Papp (2011); Papp and Alizadeh (2014)), there is no straightforward way of handling a comprehensive set of soft information.
In this paper, as in Dong and Wets (2007), we consider an arbitrarily constrained maximum likelihood estimator for densities. We appear to be the first to consider such general constraints (soft information) in the context of nonparametric density estimation. The soft information might even be random, i.e., the soft information may not be known a priori but is realized with the sample. We give concrete formulations of the constrained maximum likelihood problem in the case of soft information about support bounds, semicontinuity, continuity, smoothness, slope information and related quantities, monotonicity, log-concavity, unimodality, location of modes, symmetry, bounds on density values, neighborhood of known density, bounds on moments, and bounds on cumulative distribution functions. We allow for any combination of these, and essentially any other constraint too.
We overcome the technical difficulty caused by constraints through the theory of constrained optimization, specifically epi-convergence, and therefore avoid tuning parameters related to penalties and regularization. With the exception of the preliminary work Dong and Wets (2007), this paper is the first to utilize epi-convergence to analyze constrained density estimators. We overcome the difficulty of infinite dimensionality through the use of a new class of splines, epi-splines Royset and Wets (2014), which are highly flexible, allow for discontinuities, and enable convenient exponential transformations. Here, for the first time, the theoretical foundations for using epi-splines in density estimation are laid out. In contrast to sieves, epi-splines can be constructed independently of the sample and therefore handles small sample sizes naturally. The precursor Dong and Wets (2007) relies on a finite approximation of by Fourier coefficients. In this paper, we consider the spaces of extended real-valued semicontinuous functions, exponential transformations, and epi-spline approximations.
The reliance on epi-convergence and epi-splines allow us to view the constrained maximum likelihood problem as an approximation of a limiting optimization problem involving the actual probability density, correct soft information, and the full space of semicontinuous functions; we refer Pflug and Wets (2013) for a related study in the context of regression utilizing graphical convergence. Consequently, we not only approximate a certain function space or deal with finite sample size, but study the approximation of the whole estimation process as formulated by the limiting optimization problem. The approach facilitates the examination of families of estimators such as those that are near-optimal solutions of a constrained maximum likelihood problem.
Our primary motivation is to obtain reasonable estimates in situations with little hard information and we provide a consistency result as soft information is refined, quantify finite sample errors, and present a small computational study to motivate the estimator in that regard. Still, we also establish consistency and quantify asymptotic rates, as hard information is refined, under general constraints.
We focus exclusively on univariate densities that vanish beyond a compact interval of the real line. Although most of the results extend to the unbounded case and higher dimensions, technical issues will then become prominent and obscure the treatment of arbitrary random constraints and the supporting epi-spline approximations. Moreover, with a small sample, tail behavior can only come in via soft information, which is easily handled by our framework but omitted here for simplicity; a few experimental results can be found in Sood and Wets (2011).
The paper proceeds in Section 2 by defining the constrained maximum likelihood estimator, summarizing the underlying approximation theory, which is based on Royset and Wets (2014), and discussing existence and uniqueness. Section 3 exemplify the breadth of soft information that can be included and Section 4 provides consistency, asymptotics, and finite sample error results. A small collection of numerical examples are featured in Section 5. The paper is summarized in Section 6.
Section snippets
Exponential Epi-Spline estimator
This section formulates a constrained maximum likelihood problem and presents a finite-dimensional approximation. We discuss existence, uniqueness, and computations. The section also includes the prerequisite approximation results.
Soft information
We implement soft information about the density under consideration in the estimation problem through the set Rn, which can be any, possibly random, subset of . It is observed empirically and also illustrated in Section 5 that soft information tends to improve density estimates. In this section, we give a soft consistency theorem that, in part, explains these observations. We also give examples of constraints for specific instances of soft information. We start, however, with
Consistency, asymptotics, and error bounds
Being concerned, from now on, with asymptotics, we again view to be a random optimization problem, i.e., whose random elements are the variables and the random set Sn; we still designate a solution by sn which is now, itself, a random epi-spline. To achieve consistency, derive asymptotics and other results, we view for given m and p, as a sequence of random optimization problems that under quite general
Numerical examples
We illustrate the exponential epi-spline estimator through a series of examples using a freely available Matlab toolbox Royset and Wets (2013) that relies on the fmincon solver (Matlab 7.10.0); see also Buttrey, Royset, and Wets (2014) for a corresponding R toolbox. The focus is on showing the effect of including various sources of soft information in the context of small sample sizes. Section 5.1 shows estimates of an exponential density using 10 observation and an increasing collection of
Conclusions
We have developed a constrained maximum likelihood estimator that incorporates any soft information that might be available and therefore offers substantial flexibility for practitioners. In particular in situations with few (hard) observations, soft information can be brought in and reasonable estimates can be achieved with as little as 10 sample points. In simple but illustrative examples of estimating exponential, normal, and mixture of exponential distributions, we construct new estimates
Acknowledgments
This material is based upon work supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant numbers 00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273. The authors thank the referees for insightful comments, Drs. R. Sood and D. Singham for carrying out a part of the numerical tests, and Prof. N. Sukumar for invigorating discussions.
References (75)
Shape constrained kernel density estimation
Journal of Statistical Planning and Inference
(2009)Large sample sieve estimation of semi-nonparametric models
Handbook of econometric
(2007)- et al.
The topology of the ρ-Hausdorff distance
Annali di Matematica pura ed applicata
(1991) - et al.
Limit distribution theory for maximum likelihood estimation of a log-concave density
Annals of Statistics
(2009) - et al.
Estimation of a k-monotone density: limit distribution theory and the spline connection
Annals of Statistics
(2007) - et al.
Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds
Statistica Neerlandica
(2010) - et al.
A framework for input uncertainty analysis
Proceedings of the 2002 winter simulation conference
(2010) - et al.
Statistics for High-Dimensional Data, Methods, Theory and Applications
(2011) - Buttrey, S., Royset, J. O. & Wets, R. (2014). XSPL estimator: An R toolbox,...
- et al.
Testing and estimating shape-constrained nonparametric density and regression in the presence of measurement error
Journal of the American Statistical Association
(2011)
Input distribution selection for simulation experiments: Accounting for input uncertainty
Operations Research
Maximum likelihood estimation of a multi-dimensional log-concave density
Journal of the Royal Statistical Society Series B
Rejoinder to maximum likelihood estimation of a multi-dimensional log-concave density
Journal of the Royal Statistical Society Series B
Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density
Electronic J. Statistics
Densities, spectral densities and modality
The Annals of Statistics
On shape-preserving probabilistic wavelet approximators
Stochastic Analysis and Applications
Spline and kernel regression under shape restrictions
Monotone approximation by polynomials
SIAM Journal on Mathematical Analysis
Monotone approximation by splines
SIAM Journal on Mathematical Analysis
Estimating density functions: a constrained maximum likelihood approach
Journal of Nonparametric Statistics
Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency
Bernoulli
Approximation by log-concave distributions with applications to regression
Annals of Statistics
Maximum penalized likelihood estimation, volume I: density estimation
A new approximation method for generating day-ahead load scenarios
2013 ieee power & energy society general meeting
Collecting data and estimating parameters for input distributions
Proceedings of the 2002 winter simulation conference
On the rate of convergence of the maximum likelihood estimator of a k-monotone density
Science in China Series A: Mathematics
Nonparametric maximum likelihood estimation by the method of sieves
The Annals of Statistics
Nonparametric roughness penalties for probability densities
Biometrika
On the theory of mortality measurement. I
Skandinavisk Aktuarietidskrift
On the theory of mortality measurement. II
Skandinavisk Aktuarietidskrift
Abstract inference
Estimation of a convex function: characterizations and asymptotic theory
Annals of Statistics
Information bounds and nonparametric maximum likelihood estimation
Unimodal kernel density estimation by data sharpening
Statistica Sinica
The iterative convex minorant algorithm for nonparametric estimation
Journal of Computational and Graphical Statistics
Cited by (22)
IETM centered intelligent maintenance system integrating fuzzy semantic inference and data fusion
2017, Microelectronics ReliabilityCitation Excerpt :Belief function theory represents one of the most important tools for modeling and fusing multi-sensor pieces of evidence. It is a powerful mathematical mechanism to deal with imperfection and conflict and a flexible framework for representing and reasoning with various forms of imperfect information and knowledge [18]. The highest combined belief measure: Bel(i) is chosen as the final classification decision.
Statistical Inference of Partially Linear Spatial Autoregressive Model Under Constraint Conditions
2023, Journal of Systems Science and ComplexityExistence and Statistical Estimation of Equilibria in Stochastic Electoral Competitions
2023, Journal of Convex AnalysisHandling Hard Affine SDP Shape Constraints in RKHSs
2022, Journal of Machine Learning ResearchThe generalized Pearson family of distributions and explicit representation of the associated density functions
2022, Communications in Statistics - Theory and Methods