Abstract
Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problems. For the design problem of discrete codes, which have been used extensively in previous works, we present mostly negative results. We then introduce the notion of continuous codes and cast the design problem of continuous codes as a constrained optimization problem. We describe three optimization problems corresponding to three different norms of the code matrix. Interestingly, for the l 2 norm our formalism results in a quadratic program whose dual does not depend on the length of the code. A special case of our formalism provides a multiclass scheme for building support vector machines which can be solved efficiently. We give a time and space efficient algorithm for solving the quadratic program. We describe preliminary experiments with synthetic data show that our algorithm is often two orders of magnitude faster than standard quadratic programming packages. We conclude with the generalization properties of the algorithm.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D. W., & Bankert, R. L. (1997). Cloud classification using error-correcting output codes. Artificial Intelligence Applications: Natural Science, Agriculture, and Environmental Science, 11, 13–28.
Allwein, E., Schapire, R., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Machine Learning: Proceedings of the Seventeenth International Conference.
Berger, A. (1999). Error-correcting output coding for text classification. In IJCAI'99: Workshop on Machine Learning for Information Filtering.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth & Brooks.
Chvatal, V. (1980). Linear Programming. New York: Freeman.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20:3, 273–297.
Dietterich, G. B. T. G. (1999). Achieving high-accuracy text-to-speech with machine learning. In Data mining in speech synthesis.
Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Dietterich, T., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report, Oregon State University. Available via the WWW at http://www.cs.orst.edu:80/~tgd/cv/tr.html.
Fletcher, R. (1987). Practical methods of optimization 2nd edn. New York: John Wiley.
Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26:1, 451–471.
Höffgen, K. U., Horn, K. S. V., & Simon, H. U. (1995). Robust trainability of single neurons. Journal of Computer and System Sciences, 50:1, 114–125.
James, G., & Hastie, T. (1998). The error coding method and PiCT. Journal of Computational and Graphical Stastistics, 7:3, 377–387.
Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 313-321).
Platt, J. (1998). Fast training of Support Vector Machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.
Platt, J., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin dags for multiclass classification. In Advances in neural information processing systems 12 (pp. 547–553). Cambridge, MA: MIT Press.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing-explorations in the microstructure of cognition (ch. 8, pp. 318–362). Cambridge, MA: MIT Press.
Schapire, R. E. (1997). Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference (pp. 313-321).
Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 1–40.
Vapnik, V. N. (1998). Statistical Learning Theory. New York: Wiley.
Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium On Artificial Neural Networks.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Crammer, K., Singer, Y. On the Learnability and Design of Output Codes for Multiclass Problems. Machine Learning 47, 201–233 (2002). https://doi.org/10.1023/A:1013637720281
Issue Date:
DOI: https://doi.org/10.1023/A:1013637720281