Definition
Consider a given random variable \(\underline{F}\) and a random variable that we can modify, \(\hat{\underline{F}}\). We wish to use a sample of \(\hat{\underline{F}}\) as an estimate of a sample of \(\underline{F}\). The mean squared error (MSE) between such a pair of samples is a sum of four terms. The first term reflects the statistical coupling between \(\underline{F}\) and \(\hat{\underline{F}}\) and is conventionally ignored in bias-variance analysis. The second term reflects the inherent noise in \(\underline{F}\) and is independent of the estimator \(\hat{\underline{F}}\). Accordingly, we cannot affect this term. In contrast, the third and fourth terms depend on \(\hat{\underline{F}}\). The third term, called the bias, is independent of the precise samples of both \(\underline{F}\) and \(\hat{\underline{F}}\), and reflects the difference between the means of \(\underline{F}\) and \(\hat{\underline{F}}\). The fourth term, called the variance, is independent of the...
Recommended Reading
Angluin, D. (1992). Computational learning theory: Survey and selected bibliography. In Proceedings of the twenty-fourth annual ACM symposium on theory of computing. New York: ACM.
Berger, J. O. (1985). Statistical decision theory and bayesian analysis. New York: Springer.
Breiman, L. (1996a). Bagging predictors. Machine Learning, 24(2), 123–140.
Breiman, L. (1996b). Stacked regression. Machine Learning, 24(1), 49–64.
Buntine, W., & Weigend, A. (1991). Bayesian back-propagation. Complex Systems, 5, 603–643.
Ermoliev, Y. M., & Norkin, V. I. (1998). Monte carlo optimization and path dependent nonstationary laws of large numbers. Technical Report IR-98-009. International Institute for Applied Systems Analysis, Austria.
Lepage, G. P. (1978). A new algorithm for adaptive multidimensional integration. Journal of Computational Physics, 27, 192–203.
Mackay, D. (2003). Information theory, inference, and learning algorithms. Cambridge, UK: Cambridge University Press.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. New York: Springer.
Rubinstein, R., & Kroese, D. (2004). The cross-entropy method. New York: Springer.
Smyth, P., & Wolpert, D. (1999). Linearly combining density estimators via stacking. Machine Learning, 36(1–2), 59–83.
Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Wolpert, D. H. (1997). On bias plus variance. Neural Computation, 9, 1211–1244.
Wolpert, D. H., & Rajnarayan, D. (2007). Parametric learning and monte carlo optimization. arXiv:0704.1274v1 [cs.LG].
Wolpert, D. H., Strauss, C. E. M., & Rajnarayan, D. (2006). Advances in distributed optimization using probability collectives. Advances in Complex Systems, 9(4), 383–436.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Rajnarayan, D., Wolpert, D. (2011). Bias-Variance Trade-offs: Novel Applications. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_75
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_75
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering