Abstract
We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a different bandwidth to achieve better aggregation. The final estimate uses a weighting scheme with the inverse mean square errors of the kernel estimators as weights. This weighting scheme is shown to be optimal under independent kernel estimators. For choosing the measurement, we employ the knowledge gradient policy that relies on predictive distributions to calculate the optimal sampling point. Our method allows a setting where the beliefs are expected to be correlated but the correlation structure is unknown beforehand. Moreover, the proposed policy is shown to be asymptotically optimal.
Similar content being viewed by others
References
Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)
Barton, R.R., Meckesheimer, M.: Chapter 18 metamodel-based simulation optimization in Simulation. In: Henderson, S.G., Nelson, B.L. (eds.). vol. 13 of Handbooks in Operations Research and Management Science. Elsevier (pp. 535–574) (2006)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley-Interscience, New York (1995)
Branin, F.H.: Widely convergent method for finding multiple solutions of simultaneous nonlinear equations. IBM J. Res. Dev. 16, 504–522 (1972)
Bunea, F., Nobel, A.: Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inf. Theory 54, 1725–1735 (2008)
Chehrazi, N., Weber, T.A.: Monotone approximation of decision problems. Oper. Res. 58, 1158–1177 (2010)
Chick, S.E., Gans, N.: Economic analysis of simulation selection problems. Manag. Sci. 55, 421–437 (2009)
Cochran, W.G., Cox, G.M.: Experimental Designs. Wiley, New York (1957)
Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66 (Chapman & Hall/CRC Monographs on Statistics & Applied Probability). Chapman & Hall, London (1996)
Frazier, P.I., Powell, W.B., Dayanik, S.: knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47, 2410–2439 (2008)
Frazier, P.I., Powell, W.B., Dayanik, S.: The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21, 599–613 (2009)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting in computational learning theory. In: Vitanyi, P. (ed.) vol. 904 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg (1995)
Fu, M.C.: Chapter 19 gradient estimation. In: Simulation. In: Henderson, S.G., Nelson, B.L. (eds.) vol. 13 of Handbooks in Operations Research and Management Science. Elsevier, pp. 575–616 (2006)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, Second Edition (Texts in Statistical Science). Chapman & Hall/CRC, Boca Raton (2003)
George, A., Powell, W.B., Kulkarni, S.R.: Value function approximation using multiple aggregation for multiattribute resource management. J. Mach. Learn. Res. 9, 2079–2111 (2008)
Gibbs, M.: Bayesian Gaussian Processes for Regression and Classification, dissertation. University of Cambridge, (1997)
Ginebra, J., Clayton, M.K.: Response surface bandits. J. R. Stat. Soc. Ser. B (Methodological) 57, 771–784 (1995)
Gittins J., Jones D. (1974) A dynamic allocation index for the sequential design of experiments. In: Gani, J., Sarkadi, K., Vincze, I. (eds) Progress in Statistics. North-Holland, Amsterdam, pp. 241–266.
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. Ser. B (Methodological) 41, 148–177 (1979)
Gupta, S.S., Miescke, K.J.: Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Stat. Plan. Inference, 54, 229–244. 40 Years of Statistical Selection Theory, Part I. (1996)
Hardle, W.K.: Applied Nonparametric Regression. Cambridge University Press, Cambridge (1992)
Hardle, W.K., Muller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, Berlin (2004)
Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Glob. Optim. 34, 441–466 (2006)
Juditsky, A., Nemirovski, A.: Functional aggregation for nonparametric regression. Ann. Stat. 28, 681–712 (2000)
Kaelbling, L.P.: Learning in Embedded Systems. MIT Press, Cambridge (1993)
Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: Advances in Neural Information Processing Systems 17, MIT Press, pp. 697–704 (2005)
Mes, M.R., Powell, W.B., Frazier, P.I.: Hierarchical knowledge gradient for sequential sampling hierarchical knowledge gradient for sequential sampling. J. Mach. Learn. Res. 12, 2931–2974 (2011)
Negoescu, D.M., Frazier, P.I., Powell, W.B.: The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS J. Comput. 23, 346–363 (2011)
Nelson, B.L., Swann, J., Goldsman, D., Song, W.: Simple procedures for selecting the best simulated system when the number of alternatives is large. Oper. Res. 49, 950–963 (2001)
Olafsson, S.: Chapter 21 metaheuristics, in Simulation. In: Henderson, S.G., Nelson, B.L. (eds.) vol. 13 of Handbooks in Operations Research and Management Science., pp. 633–654. Elsevier, (2006)
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality Wiley Series in Probability and Statistics. Wiley, Hoboken (2007)
Powell, W.B., Ryzhov, I.: Optimal Learning. Wiley, Philadelphia (2012)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Ryzhov, I., Powell, W., Frazier, P.: The knowledge gradient algorithm for a general class of online learning problems, (2011)
Spall, J.C.: Introduction to Stochastic Search and Optimization. Wiley, New York (2003)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. J. Glob. Optim. 44, 509–534 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by grant AFOSR-FA9550-05-1-0121 from the Air Force Office of Scientific Research.
Proofs
Proofs
In this section, we provide the proofs for the propositions and the lemmas used in the paper. For simplicity, when there is no confusion, we use \(K(x,x^{\prime })\) to denote \(K_{i}(x,x^{\prime })\).
1.1 Proof of Proposition 1
Proof
Let \(\mathcal C \) be a generic subset of \(\mathcal K \). We first show that for any such \(\mathcal C \), the posterior of \(\mu _{x}\) given \(\mu _{x}^{i,n}\), for all \(i\in \mathcal C \) is normal with mean and precision given by,
Then, the proposition follows by letting \(\mathcal C =\mathcal K \).
Using induction, we first consider \(\mathcal C =\emptyset \), then clearly the posterior is the same as the prior \((\mu _{x}^{0},\beta _{x}^{0})\) and the above equation holds as well.
Now, assume the proposed equations for the posterior distribution hold for all \(\mathcal C \) of size \(m\), and consider \(\mathcal C ^{\prime }\) with \(m+1\) elements (\(\mathcal C ^{\prime }=\mathcal C \cup \{{j}\})\). By Bayes’ rule
where \(Y_{x}^{j}\) stands for the observations for kernel \({j}\). Using the previous induction statement
By the independence assumption,
Combining \(\mathbb P _{C}(Y_{x}^{j}\in dh|\mu _{x}=u)\) and \(\mathbb P _{C}(\mu _{x}\in du)\), we obtain
This gives us the desired result. \(\square \)
1.2 Proofs of Lemmas
This section contains the lemmas used for proving Theorem 1.
Lemma 1
For all \(x\in \mathcal X ,\lim \sup _{n}\max _{m\le n}\left| \mu _{x}^{0,m}\right| \) is finite almost surely (a.s.).
Proof
We fix \(x\in \mathcal X \). For each \(\omega \), we let \(N_{x}^{n}\left( \omega \right) \) the number of times we measure alternative \(x\) until time period \(n\),
\(N_{x}^{n}(\omega )\) is an increasing sequence for all \(\omega \) and the limit \(N_{x}^{\infty }(\omega )=\lim _{n\rightarrow \infty }N_{x}^{n}(\omega )\) exists. We bound \(\left| \mu _{x}^{0,n}\right| \) above by,
\(\frac{\beta _{x}^{n}-\beta _{x}^{0}}{\beta _{x}^{n}}\) is bounded above by 1, and the first two terms are clearly finite, therefore we only concentrate on the finiteness of the last term. Note that \(\frac{\left( y_{x}^{j+1}-\mu _{x}\right) }{\lambda _{x}}\) has a standard normal distribution. As the normal distribution has finite mean, we let \(\varOmega _{0}\) be the almost sure event where \(\left| y_{x}^{j}\right| \ne \infty \) for all \(j\in \mathbb N _{+}\). We further divide \(\varOmega _{0}\) into two sets,
where alternative \(x\) is measured finitely many times, and
where alternative \(x\) is measured infinitely often. We further define the event \(\mathcal H _{x}\) as
We will show that \(\mathbb P \left( \hat{\varOmega }_{0}\cap \mathcal H _{x}\right) =0\) and \(\mathbb P \left( \hat{\varOmega }_{0}^{C}\cap \mathcal H _{x}\right) =0\) to conclude that \(\mathbb P \left( \mathcal H _{x}\right) =\mathbb P \left( \hat{\varOmega }_{0}\cap \mathcal H _{x}\right) +\mathbb P \left( \hat{\varOmega }_{0}^{C}\cap \mathcal H _{x}\right) =0\).
For any \(\omega \in \hat{\varOmega }_{0}\cap \mathcal H _{x}\), let \(M_{x}(\omega )\) be the last time that \(x\) is measured, that is for all \(n_{1},n_{2}\ge M_{x}(\omega ),\,N_{x}^{n_{1}}(\omega )=N_{x}^{n_{2}}(\omega )\). Then, we have that
where \(M_{x}\left( \omega \right) <\infty \) by construction. However, this also implies that \(y_{x}^{j+1}=\infty \) or \(y_{x}^{j+1}=-\infty \) for at least one \(i\), therefore \(\omega \notin \hat{\varOmega }_{0}\) and we get a contradiction. Then, \(\mathbb P \left( \hat{\varOmega }_{0}\cap \mathcal H _{x}\right) =0\).
To show that \(\mathbb P \left( \hat{\varOmega }_{0}^{C}\cap \mathcal H _{x}\right) =0\), we let \(J_{i}:=1_{\{x^{i}=x\}}\frac{\left( y_{x}^{j+1}-\mu _{x}\right) }{\lambda _{x}}\) and remind that \(J_{i}\) has a standard normal distribution. We further define a subsequence \(G\left( \omega \right) \subset \mathbb N _{+}\) by,
and we let \(J^{*}:=\left( J_{i}\right) _{i\in G(\omega )}\). By construction, \(G\left( \omega \right) \) has countably infinite elements for all \(\omega \in \hat{\varOmega }_{0}^{C}\). Here, we make use a version of the law of iterated logarithms [3] which states that,
where \(\bar{Z}_{n}=\sum _{j=1}^{n}z_{i}/n\) and \(z_{j}\) are i.i.d. random variables with zero mean and variance 1. We let \(\varOmega _{1}\) be the almost sure set where this law holds for \(\bar{Z}_{n}=J_{n}^{*}\), and the proof follows by noting that \(\mathbb P \left( \hat{\varOmega }_{0}^{C}\cap \mathcal H _{x} \cap \varOmega _{1}\right) =0\). \(\square \)
Lemma 2
Assume that we have a prior on each point \(\left( \beta _{x}^{0}>0,\forall x\in \mathcal X \right) \), then for any \(x,x^{\prime }\in \mathcal X , k_{i}\in \mathcal K \), the following are finite a.s. : \(\sup _{n}\left| \mu _{x}^{i,n}\right| ,\,\sup _{n}\left| a_{x^{\prime }}^{n}(x)\right| \) and \(\sup _{n}\left| b_{x^{\prime }}^{n}(x)\right| \).
Proof
For any \(x\in \mathcal X ,k_{i}\in \mathcal K \) and \(n\in \mathbb N \), let \(p_{x^{\prime }}^{i,n}=\frac{\beta _{x}^{n}K_{i}(x,x^{\prime })}{\sum _{j=1}^{M} \beta _{x}^{n}K_{i}(x,x_{j})}\). Clearly, for any \(x^{\prime }\in \mathcal X \) all \(p_{x^{\prime }}^{i,n}\ge 0\) and \(\sum _{x^{\prime }\in \mathcal X }p_{x^{\prime }}^{i,n}=1\). That is for any \(x^{\prime }\) and \(n,\,p_{x^{\prime }}^{i,n}\) form a convex combination of \(\mu _{x^{\prime }}^{0,n}\). Then,
And the last term is finite by Lemma 1.
To show the finiteness of \(\sup _{n}|a_{x^{\prime }}^{n}(x)|\), we note that \(a_{x^{\prime }}^{n}(x)\) is a linear combination of \(\mu _{x}^{i,n}\) and \(\mu _{x^{\prime }}^{i,n}\), where the weights for \(\mu _{x}^{i,n}\) are given by \(\left( 1-\frac{\beta _{x_{n}}^{\varepsilon }K(x,x_{n})}{A_{n+1} ^{i}(x,x_{n})}\right) \) and the weight for \(\mu _{x^{\prime }}^{i,n}\) is \(\sum _{{i}\in \mathcal K }w_{x}^{i,n+1}\frac{\beta _{x_{n}}^{\varepsilon } K(x,x_{n})}{A_{n+1}^{i}(x,x_{n})}\). These weights are between 0 and 1, and the finiteness follows.
To see \(\sup _{n}|b_{x^{\prime }}^{n}(x)|\), first note that for any \({i}\in \mathcal K \) and any \(x,x^{\prime }\in \mathcal X \),
is an increasing sequence in \(n\). And trivially, \((\sigma _{x}^{n})^{2}=1/\beta _{x}^{n}\) is a decreasing sequence in \(n\). Then for any \(n\in \mathbb N \),
As \(b_{x^{\prime }}^{n}(x)\) is a convex combination of \(\tilde{\sigma }(x,x^{\prime },i)\) where the weights are given by \(w_{x}^{i,n}\), it follows that \(\sup _{n}|b_{x^{\prime }}^{n}(x)|\) is finite. \(\square \)
Lemma 3
For any \(\omega \in \varOmega \), we let \(\mathcal X ^{\prime }(\omega )\) be the random set of alternatives measured infinitely often by the KGNP policy. Fix \(\omega \in \varOmega \), then for any \(x\notin \mathcal X ^{\prime }(\omega )\) let \(x^{\prime }\in \mathcal X \) be an alternative such that \(x^{\prime }\ne x,\,K_{i}(x,x^{\prime })>0\) for at least one \(k_{i}\in \mathcal K \), and \(x^{\prime }\) is measured at least once. Also assume that \(\mu _{x}\ne \mu _{x^{\prime }}\). Then, \(\liminf _{n}\left| \mu _{x}^{i,n}-\mu _{x}^{0,n}\right| >0\) a.s. In other words, the estimator using kernel \(k_{i}\) has a bias almost surely.
Proof
As \(x\notin \mathcal X ^{\prime }\), there is some \(N<\infty \) such that \(\mu _{x}^{0,n}=\mu _{x}^{0,N}\) for all \(n\ge N\). And as \(\mu _{x}^{0,N}=\frac{\mu _{x}^{0}+\sum _{m\le N}\beta _{x}^{\varepsilon }y_{x_{m}}1_{(x_{m}=x)}}{\beta _{x}^{0}+\sum _{m\le N}\beta _{x}^{\varepsilon }1_{(x_{m}=x)}}\), it is given by a linear combination of normal random variables \(\left( y_{x_{m}}\right) \) and is a continuous random variable.
As \(x\ne x^{\prime }\) is at least measured once, and \(K_{i}(x,x^{\prime })>0, \mu _{x}^{i,n}\) contains positively weighted \(\mu _{x^{\prime }}^{0,n}\) terms. Also, using the assumption \(\mu _{x^{\prime }}\ne \mu _{x},\,\mu _{x^{\prime }}^{0,n}\) will not be perfectly correlated with \(\mu _{x}^{0,n}\). Then, as both are continuous random variables, the probability that \(\mu _{x}^{0,n}\) will be equal to any cluster point of \(\mu _{x}^{i,n}\) is zero a.s. That is \(\liminf _{n}\left| \mu _{x}^{i,n}-\mu _{x}^{0,n}\right| >0\). \(\square \)
Remark
If \(\mu _{x}\) are generated from a continuously distributed prior (e.g. normal distribution), then for all \(x\ne x^{\prime },\,\mathbb P (\mu _{x}\ne \mu _{x^{\prime }})=1\) and the assumption for the previous lemma holds almost surely.
Lemma 4
For any \(\omega \in \varOmega \), we let \(\mathcal X ^{\prime }(\omega )\) be the random set of alternatives measured infinitely often by the KGNP policy. For all \(x,x^{\prime }\in \mathcal X \), the following holds a.s.:
-
if \(x\in \mathcal X ^{\prime }\), then \(\lim _{n}b_{x^{\prime }}^{n}(x)=0\) and \(\lim _{n}b_{x}^{n}(x^{\prime })=0,\)
-
if \(x\notin \mathcal X ^{\prime }\), then \(\liminf _{n}b_{x}^{n}(x)>0.\)
Proof
We start by considering the first case, \(x\in \mathcal X ^{\prime }\). If \(K_{i}(x,x^{\prime })=0\) for all \({i}\in \mathcal K ,\,b_{x^{\prime }}^{n}(x)=b_{x}^{n}(x^{\prime })=0\) for all \(n\) by the definition. Taking \(n\rightarrow \infty \) we get the result.
If \(K_{i}(x,x^{\prime })>0\) for some \({i}\in K\), showing \(\lim _{n}b_{x^{\prime }}^{n}(x)=0\) is equivalent to showing that for all \({i}\in \mathcal K \)
As noted previously, \(A_{n}^{i}(x,x^{\prime })\) is an increasing sequence. If \(x\in \mathcal X ^{\prime }\), then we also have that, \(\beta _{x}^{n}\rightarrow \infty \), and
Therefore \(\lim _{n}b_{x^{\prime }}^{n}(x)=0\) under this case as well. Showing \(\lim _{n}b_{x}^{n}(x^{\prime })=0\), reduces to showing that,
which is also given by above.
Now for the second result, where \(K_{i}(x,x^{\prime })>0\) for some \({i}\in \mathcal K \) and \(x\notin \mathcal X ^{\prime }\); by the definition of \(b_{x}^{n}(x)\)
For a given \(\omega \in \varOmega \), let \(N\) be the last time that alternative \(x\) is observed. Then, for all \(n\ge N\),
Recall that \((\sigma _{x}^{n})^{2}=1/\beta _{x}^{n}\) and \(\lambda _{x}=1/\beta _{x}^{\varepsilon }\), and that these terms are finite for a finitely sampled alternative. For \(\liminf _{n}b_{x}^{n}(x)>0\) to hold, we only need to show that the weight stays above 0, that is,
Almost sure finiteness of the numerator has been shown above, which means we only need to show that
First we divide the set of kernels into two pieces. Let \(\mathcal K _{1}(\omega ,x)\) be the set such that, for \(\omega \in \varOmega \), there is at least one \(x^{\prime }\in \mathcal X ^{\prime }\) such that \(K_{i}(x,x^{\prime })>0\). In other words, there is one infinitely often sampled point (\(x^{\prime }\)) close to our original point (\(x\)) that has influence on the prediction. Let \(\mathcal K _{2}(\omega ,x)\)=\(\mathcal K \backslash \mathcal K _{1}\). Now as all terms are positive,
For all \(k_{i^{\prime }}\in \mathcal K _{1}\), we have that by Lemma 3, \(\liminf _{n}\nu _{x}^{{i^{\prime }},n}>0\), even if \(\lim \inf _{n}(\sigma _{x}^{{i^{\prime }},n})^{2}=0\), the limsup for the first term on the right are finite. Finally, for all \({i^{\prime }}\in \mathcal K _{2}\), as none of the points using \({i^{\prime }}\in \mathcal K _{2}\) using to predict \(\mu _{x}\) are sampled infinitely often, letting
where \(N_{x}\) is the last time point \(x\) is sampled, we have \(N_{X}<\infty \). Then, \(\beta _{x}^{n}\) for all \(x\notin \mathcal X ^{\prime }(\omega )\) is finite (and bounded above by \(N_{X}(\max _{x\notin \mathcal X ^{\prime }}\beta _{x}^{\varepsilon })\)) and
where the last term does not contain \(n\). Taking the limit supremum over \(n\) for both sides gives us the final result. \(\square \)
Rights and permissions
About this article
Cite this article
Barut, E., Powell, W.B. Optimal learning for sequential sampling with non-parametric beliefs. J Glob Optim 58, 517–543 (2014). https://doi.org/10.1007/s10898-013-0050-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-013-0050-5