Abstract
In this paper, we introduce Sequential Classifiers Combination (SCC) into text categorization to improve both the classification effectiveness and classification efficiency of the combined individual classifiers. We apply two classifiers sequentially for experimental study, where the first classifier (called filtering classifier) is used to generate candidate categories for the test document and the second classifier (called deciding classifier) is used to select a final category for the test document from the candidate categories. Experimental results indicate that when combining boosting and kNN methods, the combined classifier outperforms the best one of the two individual classifiers, and in the case of combining Rocchio and kNN methods, the combined classifier performs equally well as kNN while its efficiency is much better than kNN and is close to that of Rocchio.
This work was supported by National Natural Science Foundation of China under grant No. 60373019.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval, pp. 37–50 (1992)
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 81–93 (1994)
Yang, Y.: Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In: Proceedings of SIGIR 1994, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE, pp. 13–22 (1994)
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of SDAIR-1995, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 317–332 (1995)
Hull, D.A.: Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of SIGIR 1994, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE, pp. 282–289 (1994)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 215–223 (1998)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 42–49 (1999)
Larkey, S., Croft, W.: Combining classifiers in text classification. In: Proc. SIGIR, pp. 81–93 (1996)
Woods, K., Kegelmeyer Jr., W.P., Bowyer Jr., K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. PAMI 19(4), 405–410 (1997)
Li, Y.H., Jain, A.K.: Classification of text documents. Computer. J. 41(8), 537–546 (1998)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Zhou, S., Guan, J., Hu, Y.: Chinese documents classification based on N-grams. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 405–414. Springer, Heidelberg (2002)
Larkey, L.S.: Automatic essay grading using text categorization techniques. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 90–95 (1998)
Rahman, A.F.R., Fairhurst, M.C.: Serial Combination of Multiple Experts: A Unified Evaluation. Pattern Anal. Appl. 2(4), 292–311 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Z., Zhou, S., Zhou, A. (2004). Sequential Classifiers Combination for Text Categorization: An Experimental Study. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-27772-9_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive