Abstract
Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cañas, A.J., F. R. Safayeni, D. W. Conrath, A Conceptual Model and Experiments on How People Classify and Retrieve Documents. Department of Management Sciences, University of Waterloo, Ontario, Canada, 1985.
Dasigi. V, Mann C. Reinhold, Protopopescu A. Vladimir, “Information fusion for text classification-an experimental comparison”, in The Journal of The Pattern Recognition Society, 34(Sept 2001) 2413–2425.
Dasigi, V. and N. Verma: Automatic Generation of Category Profiles and their Evaluation through Text Classification, Proc.2nd International Conference on Intelligent Technologies, November, 2001, pp. 421–427.
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer and Richard Harshman, "Indexing by latent semantic analysis", in Journal of the American Society for Information Science, 41(6), 391–407, 1990.
Sebestiani, Fabrizio. Attardi, Guiseppe, “Theseus: Categorization by context”, Giuseppe Attardi Dipartimento di Informatica Universit di Pisa, Italy...(1999).
Fuhr, Norbert, Stephen Hartman, Gerhard Lustig, Michael Schwanter, Konstadinos Tzeres and Gerhard Knorz, "Air/X— a rule based multistage indexing system for large subject fields", In RIAO 91 Conference Proceedings: Intelligent Text and Image Handling, 606–623, 1991.
Lewis, David D., “Representation and Learning in Information Retrieval” Ph.D. thesis, Department of Computer Science; University of Massachusetts; Amherst, MA, 1992.
Lewis, David D., “An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task”, Fifteenth Annual International Association for Computing Machinery SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, 1992, 37–50.
Ittner, D.D., Lewis, D.D., Ahn, D., “Text categorization of low quality images”. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, 1995, 301–315.
Moens, M.-F. and Dumortier, J., Automatic Categorization of Magazine Articles, Katholieke Universiteit Leuven, BelgiumInterdisciplinary Centre for Law & IT (ICRI).
Riloff, E., W. Lehnert, "Information Extraction as a Basis for High-Precision Text Classification," ACM Transactions on Information Systems, 12 (3), 1994, 296–333.
Rosch, E., "Principles of Categorization," in Cognition and Categorization, E. Rosch, B. B. Lloyd (Eds.), (Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1978), 27–48.
Smith, E.E., "Categorization," in An invitation to Cognitive Science, Vol. 3, Thinking, D. N. Osherson, E. E. Smith (Eds), The MIT Press, 1990, 33–53.
Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University, 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kongovi, M., Guzman, J.C., Dasigi, V. (2002). Text Categorization: An Experiment Using Phrases. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_15
Download citation
DOI: https://doi.org/10.1007/3-540-45886-7_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43343-9
Online ISBN: 978-3-540-45886-9
eBook Packages: Springer Book Archive