Abstract
The hierarchical mixture of experts architecture provides a flexible procedure for implementing classification algorithms. The classification is obtained by a recursive soft partition of the feature space in a data-driven fashion. Such a procedure enables local classification where several experts are used, each of which is assigned with the task of classification over some subspace of the feature space. In this work, we provide data-dependent generalization error bounds for this class of models, which lead to effective procedures for performing model selection. Tight bounds are particularly important here, because the model is highly parameterized. The theoretical results are complemented with numerical experiments based on a randomized algorithm, which mitigates the effects of local minima which plague other approaches such as the expectation-maximization algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. Technical Report 638, Department of Statistics, U.C. Berkeley (2003)
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)
Blake, C.L. Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. The Annals of Probability 31, 1583–1614 (2003)
de Boer, P.T., Kroese, D.P., Mannor, S.,Rubinstein, R.Y.: A tutorial on the cross-entropy method. Annals of Operations Research ( 2004) ( to appear)
Desyatnikov, I., Meir, R.: Data-dependent bounds for multi-category classification based on convex losses. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 159–172. Springer, Heidelberg (2003)
Ghaharamani, Z., Nakano, R., Ueda, N., Hinton, G.E.: Smem algorithm for mixture models. Neural Computation 12, 2109–2128 (2000)
Jaakkola, T.: Tutorial on variational approximation methods. In: Opper, M., Saad, D. (eds.) Advanced Mean Field Methods: Theory and Practice, pp. 129–159. MIT Press, Cambridge (2001)
Jiang, W.: Complexity regularization via localized random penalties. Neural Computation 12(6)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6(2), 181–214 (1994)
Ledoux, M., Talgrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer Press, New York (1991)
Mannor, S., Meir, R., Zhang, T.: Greedy algorithms for classification - consistency, convergence rates, and adaptivity. Journal of Machine Learning Research 4, 713–741 (2003)
McCullach, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. CRC Press, Boca Raton (1989)
McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics, pp. 148–188. Cambridge University Press, Cambridge (1989)
Meir, R., El-Yaniv, R., Ben-David, S.: Localized boosting. In: Cesa-Bianchi, N., Goldman, S. (eds.) Proc. Thirteenth Annual Conference on Computaional Learning Theory, pp. 190–199. Morgan Kaufman, San Francisco (2000)
Meir, R., Zhang, T.: Generalization bounds for Bayesian mixture algorithms. Journal of Machine Learning Research 4, 839–860 (2003)
Nakano, R., Ueda, N.N.: Determinisic annealing em algorithm. Neural Networks 11(2) (1998)
Rubinstein, R.Y.: The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability 1, 127–190 (1999)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer Verlag, New York (1996)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics 32(1) (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Azran, A., Meir, R. (2004). Data Dependent Risk Bounds for Hierarchical Mixture of Experts Classifiers. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive