Skip to main content

Intrinsic Geometries in Learning

  • Chapter
Emerging Trends in Visual Computing (ETVC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5416))

Included in the following conference series:

  • 1504 Accesses

Abstract

In a seminal paper, Amari (1998) proved that learning can be made more efficient when one uses the intrinsic Riemannian structure of the algorithms’ spaces of parameters to point the gradient towards better solutions. In this paper, we show that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-line learning algorithms, are spawns of a generalization of Amari’s natural gradient to some particular non-Riemannian spaces. These algorithms exploit an intrinsic dual geometric structure of the space of parameters in relationship with particular integral losses that are to be minimized. We unite some of them, such as AdaBoost, additive regression with the square loss, the logistic loss, the top-down induction performed in CART and C4.5, as a single algorithm on which we show general convergence to the optimum and explicit convergence rates under very weak assumptions. As a consequence, many of the classification calibrated surrogates of Bartlett et al. (2006) admit efficient minimization algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amari, S.-I.: Natural Gradient works efficiently in Learning. Neural Computation 10, 251–276 (1998)

    Article  Google Scholar 

  2. Azran, A., Meir, R.: Data dependent risk bounds for hierarchical mixture of experts classifiers. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 427–441. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a bregman predictor. IEEE Trans. on Information Theory 51, 2664–2669 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  5. Bartlett, P., Jordan, M., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the Am. Stat. Assoc. 101, 138–156 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bartlett, P., Traskin, M.: Adaboost is consistent. In: NIPS*19 (2006)

    Google Scholar 

  7. Blake, C.L., Keogh, E., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  8. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  9. Breiman, L., Freidman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)

    Google Scholar 

  10. Collins, M., Schapire, R., Singer, Y.: Logistic regression, adaboost and Bregman distances. In: COLT 2000, pp. 158–169 (2000)

    Google Scholar 

  11. Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: ICML 2007 (2007)

    Google Scholar 

  12. Dhillon, I., Sra, S.: Generalized non-negative matrix approximations with Bregman divergences. In: Advances in Neural Information Processing Systems, vol. 18 (2005)

    Google Scholar 

  13. Freund, Y., Schapire, R.E.: A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Comp. Syst. Sci. 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  14. Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting. Ann. of Stat. 28, 337–374 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gates, G.W.: The Reduced Nearest Neighbor rule. IEEE Trans. on Information Theory 18, 431–433 (1972)

    Article  Google Scholar 

  16. Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: NIPS*11, pp. 225–231 (1998)

    Google Scholar 

  17. Gentile, C., Warmuth, M.: Proving relative loss bounds for on-line learning algorithms using Bregman divergences. In: Tutorials of the 13 th International Conference on Computational Learning Theory (2000)

    Google Scholar 

  18. Grünwald, P., Dawid, P.: Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. of Statistics 32, 1367–1433 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Henry, C., Nock, R., Nielsen, F.: IReal boosting a la Carte with an application to boosting Oblique Decision Trees. In: Proc. of the 21 st International Joint Conference on Artificial Intelligence, pp. 842–847 (2007)

    Google Scholar 

  20. Herbster, M., Warmuth, M.: Tracking the best regressor. In: COLT 1998, pp. 24–31 (1998)

    Google Scholar 

  21. Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)

    Google Scholar 

  22. Kearns, M.J.: Thoughts on hypothesis boosting, ML class project (1988)

    Google Scholar 

  23. Kearns, M.J., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Comp. Syst. Sci. 58, 109–128 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  24. Kearns, M.J., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. In: Proc. of the 21 th ACM Symposium on the Theory of Computing, pp. 433–444 (1989)

    Google Scholar 

  25. Kivinen, J., Warmuth, M., Hassibi, B.: The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Trans. on Signal Processing 54, 1782–1793 (2006)

    Article  Google Scholar 

  26. Kohavi, R.: The power of Decision Tables. In: Proc. of the 10 th European Conference on Machine Learning, pp. 174–189 (1995)

    Google Scholar 

  27. Matsushita, K.: Decision rule, based on distance, for the classification problem. Ann. of the Inst. for Stat. Math. 8, 67–77 (1956)

    Article  MathSciNet  Google Scholar 

  28. Mitchell, T.M.: The need for biases in learning generalization. Technical Report CBM-TR-117, Rutgers University (1980)

    Google Scholar 

  29. Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of \({\mathcal{U}}\)-Boost and Bregman divergence. Neural Computation, 1437–1481 (2004)

    Google Scholar 

  30. Nielsen, F., Boissonnat, J.-D., Nock, R.: On Bregman Voronoi diagrams. In: Proc. of the 19 th ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)

    Google Scholar 

  31. Nielsen, F., Boissonnat, J.-D., Nock, R.: Bregman Voronoi Diagrams: properties, algorithms and applications, 45 p. (submission, 2008)

    Google Scholar 

  32. Nock, R.: Inducing interpretable Voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments. Journal of Artificial Intelligence Research 17, 137–170 (2002)

    MathSciNet  MATH  Google Scholar 

  33. Nock, R., Nielsen, F.: A ℝeal Generalization of discrete AdaBoost. Artif. Intell. 171, 25–41 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  34. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  35. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of statistics 26, 1651–1686 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  36. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning Journal 37, 297–336 (1999)

    Article  MATH  Google Scholar 

  37. Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)

    Article  MATH  Google Scholar 

  38. Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)

    MATH  Google Scholar 

  39. Warmuth, M., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: ICML 2006, pp. 1001–1008 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nock, R., Nielsen, F. (2009). Intrinsic Geometries in Learning. In: Nielsen, F. (eds) Emerging Trends in Visual Computing. ETVC 2008. Lecture Notes in Computer Science, vol 5416. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00826-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00826-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00825-2

  • Online ISBN: 978-3-642-00826-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics