Abstract
This paper compares the effectiveness of n-way (n>2) classification using a probabilistic classifier to the use of multiple binary probabilistic classifiers. We describe the use of binary classifiers in both Round Robin and Elimination tournaments, and compare both tournament methods and n-way classification when determining the language of origin of speakers (both native and non-native English speakers) speaking English. We conducted hundreds of experiments by varying the number of categories as well as the categories themselves. In all experiments the tournament methods performed better than the n-way classifier, and of these tournament methods, on average, Round Robin performs slightly better than the Elimination tournament.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xia, Y., Liu, W., Guthrie, L.: Email Categorization with Tournament Methods. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 150–160. Springer, Heidelberg (2005)
Sehgal, S.: Identification of Speaker Origin From Transcribed Speech Text. Thesis (MSc, supervisor Guthrie, L). University of Sheffield, UK (2004)
Smadja, F., Tumblin, H.: Automatic Spam Detection as a Text Classification Task. Elron Software (2003)
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Lewis, D.: Naive Bayes at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classifi-cation. In: AAAI 1998 Workshop on Text Categorization (1998)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the work-shop on Machine Learning in the New Information Age (2000)
Carrerras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: Proc. RANLP 2001 (2001)
Thorsten, J.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Proc. SIGIR 2001, New Orleans, ACM Press, New York (2001)
Wiener, E., Pederson, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proc. SDAIR 1995, Nevada, Las Vegas, pp. 317–332 (1995)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal IR 1(1/2), 67–88 (1999)
Xia, Y., Dalli, A., Wilks, Y., Guthrie, L.: FASiL Adaptive Email Categorization System. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 723–734. Springer, Heidelberg (2005)
Guthrie, L., Guthrie, J., Walker, E.: Document classification by machine: theory and practice. In: Proceedings of the 16th International Conference on Computational Linguis-tics (COLING 1994), Kyoto, Japan, pp. 1059–1063 (1994)
Doddington, G.: Speaker Recognition based on Idiolectal Differences between Speakers. In: Proc. Eurospeech 2001, Aalborg, Denmark, September 3-7, vol. 4, pp. 2521–2524 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guthrie, L., Liu, W., Xia, Y. (2005). Text Classification with Tournament Methods. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_10
Download citation
DOI: https://doi.org/10.1007/11551874_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)