Abstract
We introduce a variant of the Perceptron algorithm called second-order Perceptron algorithm, which is able to exploit certain spectral properties of the data. We analyze the second-order Perceptron algorithm in the mistake bound model of on-line learning and prove bounds in terms of the eigenvalues of the Gram matrix created from the data. The performance of the second-order Perceptron algorithm is affected by the setting of a parameter controlling the sensitivity to the distribution of the eigenvalues of the Gram matrix. Since this information is not preliminarly available to on-line algorithms, we also design a refined version of the second-order Perceptron algorithm which adaptively sets the value of this parameter. For this second algorithm we are able to prove mistake bounds corresponding to a nearly optimal constant setting of the parameter.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2(4), 319–342.
Auer, P., & Warmuth, M. K. (1998). Tracking the best disjunction. Machine Learning, 32(2), 127–150.
Auer, P., Cesa Bianchi, N., & Gentile, C. (2001). Adaptive and self-confident online learning algorithms. Journal of Computer and System Sciences, to appear.
Auer, P. (2000). Using Upper Confidence Bounds for Online Learning. In 41st FOCS, IEEE, pp. 270–279.
Azoury K. S., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential familiy of distributions. Machine Learning, 43(3), 211–246.
Ben-Israel, A. & Greville, T. N. E. (1974). Generalized Inverses: Theory and Applications. John Wiley and Sons.
Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34, 123–135.
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. J. ACM, 44(3), 427–485.
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2001). On the generalization ability of on-line learning algorithms. In NIPS 13, MIT Press, to appear.
Cristianini, N. & Shawe-Taylor, J. (2001). An Introduction to Support Vector Machines. Cambridge University Press.
Deerwester, S., Dumais, S. T., Furnas, G. W., Laundauer, T. K., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification. John Wiley and Sons.
Gentile, C. & Warmuth, M. (1998). Linear hinge loss and average margin. In NIPS 10, MIT Press, pp. 225–231.
Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213–242.
Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning Journal, 43(3), 173–210.
Herbster, M., & Warmuth, M. K. (1998). Tracking the best expert. Machine Learning Journal, 32(2), 151–178.
Hoerl, A., & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.
Kivinen, J., Warmuth, M. K., & Auer, P. (1997). The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97, 325–343.
Li, Y., & Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learning Journal, 46(1/3), 361–387.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4), 285–318.
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108:2, 212–261.
Marcus, M., & Minc, H. (1965). Introduction to Linear Algebra. Dover.
Novikov, A. B. J. (1962). On convergence proofs on perceptrons. Proc. of the Symposium on the Mathematical Theory of Automata, vol. XII, pp. 615–622.
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Wetterling. W. T. (1989). Numerical Recipes in Pascal. Cambridge University Press.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.
Vapnik, V. (1998). Statistical learning theory. New York: J. Wiley & Sons.
Vovk, V. (1990). Aggregating strategies. In 3rd COLT, Morgan Kaufmann, pp. 371–383.
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Williamson, R. C., Shawe-Taylor, J., Schölkopf, B., & Smola, A. (1999). Sample based generalization bounds. Technical Report NC-TR-99-055, NeuroCOLT.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cesa-Bianchi, N., Conconi, A., Gentile, C. (2002). A Second-Order Perceptron Algorithm. In: Kivinen, J., Sloan, R.H. (eds) Computational Learning Theory. COLT 2002. Lecture Notes in Computer Science(), vol 2375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45435-7_9
Download citation
DOI: https://doi.org/10.1007/3-540-45435-7_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43836-6
Online ISBN: 978-3-540-45435-9
eBook Packages: Springer Book Archive