Intrinsic Geometries in Learning

Nock, Richard; Nielsen, Frank

doi:10.1007/978-3-642-00826-9_8

Richard Nock¹⁷ &
Frank Nielsen^18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5416))

Included in the following conference series:

LIX Fall Colloquium on Emerging Trends in Visual Computing

1504 Accesses

Abstract

In a seminal paper, Amari (1998) proved that learning can be made more efficient when one uses the intrinsic Riemannian structure of the algorithms’ spaces of parameters to point the gradient towards better solutions. In this paper, we show that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-line learning algorithms, are spawns of a generalization of Amari’s natural gradient to some particular non-Riemannian spaces. These algorithms exploit an intrinsic dual geometric structure of the space of parameters in relationship with particular integral losses that are to be minimized. We unite some of them, such as AdaBoost, additive regression with the square loss, the logistic loss, the top-down induction performed in CART and C4.5, as a single algorithm on which we show general convergence to the optimum and explicit convergence rates under very weak assumptions. As a consequence, many of the classification calibrated surrogates of Bartlett et al. (2006) admit efficient minimization algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Optimization by Gradient Boosting

The voice of optimization

Article 19 July 2020

Robust Algorithms via PAC-Bayes and Laplace Distributions

References

Amari, S.-I.: Natural Gradient works efficiently in Learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Azran, A., Meir, R.: Data dependent risk bounds for hierarchical mixture of experts classifiers. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 427–441. Springer, Heidelberg (2004)
Chapter Google Scholar
Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a bregman predictor. IEEE Trans. on Information Theory 51, 2664–2669 (2005)
Article MathSciNet MATH Google Scholar
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
Bartlett, P., Jordan, M., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the Am. Stat. Assoc. 101, 138–156 (2006)
Article MathSciNet MATH Google Scholar
Bartlett, P., Traskin, M.: Adaboost is consistent. In: NIPS*19 (2006)
Google Scholar
Blake, C.L., Keogh, E., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Breiman, L., Freidman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)
Google Scholar
Collins, M., Schapire, R., Singer, Y.: Logistic regression, adaboost and Bregman distances. In: COLT 2000, pp. 158–169 (2000)
Google Scholar
Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: ICML 2007 (2007)
Google Scholar
Dhillon, I., Sra, S.: Generalized non-negative matrix approximations with Bregman divergences. In: Advances in Neural Information Processing Systems, vol. 18 (2005)
Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Comp. Syst. Sci. 55, 119–139 (1997)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting. Ann. of Stat. 28, 337–374 (2000)
Article MathSciNet MATH Google Scholar
Gates, G.W.: The Reduced Nearest Neighbor rule. IEEE Trans. on Information Theory 18, 431–433 (1972)
Article Google Scholar
Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: NIPS*11, pp. 225–231 (1998)
Google Scholar
Gentile, C., Warmuth, M.: Proving relative loss bounds for on-line learning algorithms using Bregman divergences. In: Tutorials of the 13^th International Conference on Computational Learning Theory (2000)
Google Scholar
Grünwald, P., Dawid, P.: Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. of Statistics 32, 1367–1433 (2004)
Article MathSciNet MATH Google Scholar
Henry, C., Nock, R., Nielsen, F.: IReal boosting a la Carte with an application to boosting Oblique Decision Trees. In: Proc. of the 21^st International Joint Conference on Artificial Intelligence, pp. 842–847 (2007)
Google Scholar
Herbster, M., Warmuth, M.: Tracking the best regressor. In: COLT 1998, pp. 24–31 (1998)
Google Scholar
Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Google Scholar
Kearns, M.J.: Thoughts on hypothesis boosting, ML class project (1988)
Google Scholar
Kearns, M.J., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Comp. Syst. Sci. 58, 109–128 (1999)
Article MathSciNet MATH Google Scholar
Kearns, M.J., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. In: Proc. of the 21^th ACM Symposium on the Theory of Computing, pp. 433–444 (1989)
Google Scholar
Kivinen, J., Warmuth, M., Hassibi, B.: The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Trans. on Signal Processing 54, 1782–1793 (2006)
Article Google Scholar
Kohavi, R.: The power of Decision Tables. In: Proc. of the 10^th European Conference on Machine Learning, pp. 174–189 (1995)
Google Scholar
Matsushita, K.: Decision rule, based on distance, for the classification problem. Ann. of the Inst. for Stat. Math. 8, 67–77 (1956)
Article MathSciNet Google Scholar
Mitchell, T.M.: The need for biases in learning generalization. Technical Report CBM-TR-117, Rutgers University (1980)
Google Scholar
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of \({\mathcal{U}}\)-Boost and Bregman divergence. Neural Computation, 1437–1481 (2004)
Google Scholar
Nielsen, F., Boissonnat, J.-D., Nock, R.: On Bregman Voronoi diagrams. In: Proc. of the 19^th ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)
Google Scholar
Nielsen, F., Boissonnat, J.-D., Nock, R.: Bregman Voronoi Diagrams: properties, algorithms and applications, 45 p. (submission, 2008)
Google Scholar
Nock, R.: Inducing interpretable Voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments. Journal of Artificial Intelligence Research 17, 137–170 (2002)
MathSciNet MATH Google Scholar
Nock, R., Nielsen, F.: A ℝeal Generalization of discrete AdaBoost. Artif. Intell. 171, 25–41 (2007)
Article MathSciNet MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of statistics 26, 1651–1686 (1998)
Article MathSciNet MATH Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning Journal 37, 297–336 (1999)
Article MATH Google Scholar
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)
Article MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
MATH Google Scholar
Warmuth, M., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: ICML 2006, pp. 1001–1008 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

CEREGMIA, Université Antilles-Guyane, Schoelcher, France
Richard Nock
LIX, Ecole Polytechnique, Palaiseau, France
Frank Nielsen
Sony Computer Science Laboratories Inc., Tokyo, Japan
Frank Nielsen

Authors

Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar
Frank Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIX, Ecole Polytechnique, Route de Saclay, 91128, Palaiseau Cedex, France
Frank Nielsen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nock, R., Nielsen, F. (2009). Intrinsic Geometries in Learning. In: Nielsen, F. (eds) Emerging Trends in Visual Computing. ETVC 2008. Lecture Notes in Computer Science, vol 5416. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00826-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-00826-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00825-2
Online ISBN: 978-3-642-00826-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics