Abstract
We extend a multi-class categorization scheme proposed by Dietterich and Bakiri 1995 for binary classifiers, using error correcting codes. The extension comprises the computation of the codes by a simulated annealing algorithm and optimization of Kullback-Leibler (KL) category distances within the code-words. For the first time, we apply the scheme to text categorization with support vector machines (SVMs) on several large text corpora with more than 100 categories. The results are compared to 1-of-N coding (i.e. one SVM for each text category). We also investigate codes with optimized KL distance between the text categories which are merged in the code-words. We find that error correcting codes perform better than 1-of-N coding with increasing code length. For very long codes, the performance is in some cases further improved by KL-distance optimization.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proc. 17th International Conf. on Machine Learning, pages 9–16. Morgan Kaufmann, San Francisco, CA, 2000.
Koby Crammer and Yoram Singer. On the learnability and design of output codes for multiclass problems. In Computational Learing Theory, pages 35–46, 2000.
J. Diederich, K. Kindermann, E. Leopold, and G. Paass. Authorship attribution with support vector machines. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Techniques, 2001. in press.
T.G. Dietterich and G. Bakiri. Solving multiclass learning via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In 7th International Conference on Information and Knowledge Managment, 1998.
Stephen E. Fienberg. The analysis of cross-classi.ed categorical data. 1980.
Y. Guermeur, A. Elisee., and H. Paugam-Moisy. A new multi-class svm based on a uniform convergence result. In S.-I. Amari, C.L. Giles, M. Gori, and V. Piuri, editors, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks IJCNN 2000, pages IV-183–IV-188, Los Alamitos, 2000. IEEE Computer Society.
C.-W. Hsu and C.J. Lin. A comparison on methods for multi-class support vector machines. unpulished manuscript, see http://www.csie.ntu.edu.tw/~cjlin/papers/multisvm.ps.gz, April 2001.
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In C. Nedellec and C. Rouveirol, editors, European Conference on Machine Learning (ECML), 1998.
J. Kindermann, E. Leopold, and G. Paass. Multi-class classification with error correcting codes. Technical report, GMD, Oct 2000. Beiträge zum Treffen der GI Fachgruppe 1.1.3 Maschinelles Lernen.
E. Leopold and J. Kindermann. Text categorization with support vector machines. how to represent texts in input space? Machine Learning, 2001. in press.
C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
J. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classification. In Advances in Neural Information Processing Systems 12. MIT Press, 2000.
V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kindermann, J., Paass, G., Leopold, E. (2001). Error Correcting Codes with Optimized Kullback-Leibler Distances for Text Categorization. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_22
Download citation
DOI: https://doi.org/10.1007/3-540-44794-6_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive