Abstract
Text classification, whether by topic or genre, is an important task that contributes to text extraction, retrieval, summarization and question answering. In this paper we present a new pairwise ensemble approach, which uses pairwise Support Vector Machine (SVM) classifiers as base classifiers and “input-dependent latent variable” method for model combination. This new approach better captures the characteristics of genre classification, including its heterogeneous nature. Our experiments on two multi-genre collections and one topic-based classification datasets show that the pairwise ensemble method outperforms both boosting, which has been demonstrated as a powerful ensemble approach, and Error-Correcting Output Codes (ECOC), which applies pairwise-like classifiers for multiclass classification problems.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- Text Categorization
- Ensemble Approach
- Latent Variable Approach
- Hierarchical Mixture
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report (1998)
Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic combination of text classifiers using reliability indicators: Models and results. In: SIGIR 2002 (2002)
Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999: Workshop on machine learning for information filtering (1999)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: SIGIR 1994, pp. 292–300 (1994)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
Finn, A., Kushmerick, N., Smyth, B.: Genre classification and domain transfer for information filtering. In: Proceedings of ECIR 2002 (2002)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: European Conference on Computational Learning Theory, pp. 23–37 (1995)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)
Fürnkranz, J.: Round robin rule learning. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 146–153 (2001)
Ghani, R.: Using error-correcting codes for text classification. In: Proceedings of 17th International Conference on Machine Learning, pp. 303–310 (2000)
Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: ACM Symposium on Applied Computing, pp. 798–802 (2003)
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Advances in Neural Information Processing Systems, vol. 10, The MIT Press, Cambridge (1998)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6, 181–214 (1994)
Kessler, B., Nunberg, G., Schütze, H.: Automatic detection of text genre. In: Proceedings of the Thirty-Fifth ACL and EACL, pp. 32–38 (1997)
Liu, Y., Yang, Y., Carbonell, J.: Boosting to correct the inductive bias for text classification. In: Proc. of CIKM 2002 (2002)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Quinlan, J.R.: Bagging, boosting, and c4.5. In: Proceedings of the 13th National Conference on Artifitial Intelligence on Machine Learning, pp. 322–330 (1996)
Rennie, J.: Improving multi-class text classification with support vector machine. Master’s thesis, Massachusetts Institute of Technology (2001)
Schapire, R., Singer, Y.: Boosttexter: Aboosting-based system for text categorization. Machine Learning 39(1/3), 135–168 (2000)
Toutanova, K., Chen, F., Popat, K., Hofmann, T.: Text classification in a hierarchical mixture model for small training sets. In: Proc. of CIKM 2001 (2001)
Wolpert, D.: Stacked generalization. Neural Networks, 241–259 (1992)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Yang, Y., Carbonell, J., Brown, R., Lafferty, J., Pierce, T., Ault, T.: Multi-strategy learning for topic detection and tracking. In: TDT 1999 book, Kluwer Academic Press, Dordrecht (1999)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Carbonell, J., Jin, R. (2003). A New Pairwise Ensemble Approach for Text Classification. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive