Abstract
In a seminal paper, Amari (1998) proved that learning can be made more efficient when one uses the intrinsic Riemannian structure of the algorithms’ spaces of parameters to point the gradient towards better solutions. In this paper, we show that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-line learning algorithms, are spawns of a generalization of Amari’s natural gradient to some particular non-Riemannian spaces. These algorithms exploit an intrinsic dual geometric structure of the space of parameters in relationship with particular integral losses that are to be minimized. We unite some of them, such as AdaBoost, additive regression with the square loss, the logistic loss, the top-down induction performed in CART and C4.5, as a single algorithm on which we show general convergence to the optimum and explicit convergence rates under very weak assumptions. As a consequence, many of the classification calibrated surrogates of Bartlett et al. (2006) admit efficient minimization algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amari, S.-I.: Natural Gradient works efficiently in Learning. Neural Computation 10, 251–276 (1998)
Azran, A., Meir, R.: Data dependent risk bounds for hierarchical mixture of experts classifiers. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 427–441. Springer, Heidelberg (2004)
Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a bregman predictor. IEEE Trans. on Information Theory 51, 2664–2669 (2005)
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
Bartlett, P., Jordan, M., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the Am. Stat. Assoc. 101, 138–156 (2006)
Bartlett, P., Traskin, M.: Adaboost is consistent. In: NIPS*19 (2006)
Blake, C.L., Keogh, E., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 7, 200–217 (1967)
Breiman, L., Freidman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)
Collins, M., Schapire, R., Singer, Y.: Logistic regression, adaboost and Bregman distances. In: COLT 2000, pp. 158–169 (2000)
Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: ICML 2007 (2007)
Dhillon, I., Sra, S.: Generalized non-negative matrix approximations with Bregman divergences. In: Advances in Neural Information Processing Systems, vol. 18 (2005)
Freund, Y., Schapire, R.E.: A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Comp. Syst. Sci. 55, 119–139 (1997)
Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting. Ann. of Stat. 28, 337–374 (2000)
Gates, G.W.: The Reduced Nearest Neighbor rule. IEEE Trans. on Information Theory 18, 431–433 (1972)
Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: NIPS*11, pp. 225–231 (1998)
Gentile, C., Warmuth, M.: Proving relative loss bounds for on-line learning algorithms using Bregman divergences. In: Tutorials of the 13 th International Conference on Computational Learning Theory (2000)
Grünwald, P., Dawid, P.: Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. of Statistics 32, 1367–1433 (2004)
Henry, C., Nock, R., Nielsen, F.: IReal boosting a la Carte with an application to boosting Oblique Decision Trees. In: Proc. of the 21 st International Joint Conference on Artificial Intelligence, pp. 842–847 (2007)
Herbster, M., Warmuth, M.: Tracking the best regressor. In: COLT 1998, pp. 24–31 (1998)
Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Kearns, M.J.: Thoughts on hypothesis boosting, ML class project (1988)
Kearns, M.J., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Comp. Syst. Sci. 58, 109–128 (1999)
Kearns, M.J., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. In: Proc. of the 21 th ACM Symposium on the Theory of Computing, pp. 433–444 (1989)
Kivinen, J., Warmuth, M., Hassibi, B.: The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Trans. on Signal Processing 54, 1782–1793 (2006)
Kohavi, R.: The power of Decision Tables. In: Proc. of the 10 th European Conference on Machine Learning, pp. 174–189 (1995)
Matsushita, K.: Decision rule, based on distance, for the classification problem. Ann. of the Inst. for Stat. Math. 8, 67–77 (1956)
Mitchell, T.M.: The need for biases in learning generalization. Technical Report CBM-TR-117, Rutgers University (1980)
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of \({\mathcal{U}}\)-Boost and Bregman divergence. Neural Computation, 1437–1481 (2004)
Nielsen, F., Boissonnat, J.-D., Nock, R.: On Bregman Voronoi diagrams. In: Proc. of the 19 th ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)
Nielsen, F., Boissonnat, J.-D., Nock, R.: Bregman Voronoi Diagrams: properties, algorithms and applications, 45 p. (submission, 2008)
Nock, R.: Inducing interpretable Voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments. Journal of Artificial Intelligence Research 17, 137–170 (2002)
Nock, R., Nielsen, F.: A ℝeal Generalization of discrete AdaBoost. Artif. Intell. 171, 25–41 (2007)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of statistics 26, 1651–1686 (1998)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning Journal 37, 297–336 (1999)
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
Warmuth, M., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: ICML 2006, pp. 1001–1008 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nock, R., Nielsen, F. (2009). Intrinsic Geometries in Learning. In: Nielsen, F. (eds) Emerging Trends in Visual Computing. ETVC 2008. Lecture Notes in Computer Science, vol 5416. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00826-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-00826-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00825-2
Online ISBN: 978-3-642-00826-9
eBook Packages: Computer ScienceComputer Science (R0)