Skip to main content

Text Categorization: An Experiment Using Phrases

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

  • 482 Accesses

Abstract

Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cañas, A.J., F. R. Safayeni, D. W. Conrath, A Conceptual Model and Experiments on How People Classify and Retrieve Documents. Department of Management Sciences, University of Waterloo, Ontario, Canada, 1985.

    Google Scholar 

  2. Dasigi. V, Mann C. Reinhold, Protopopescu A. Vladimir, “Information fusion for text classification-an experimental comparison”, in The Journal of The Pattern Recognition Society, 34(Sept 2001) 2413–2425.

    Google Scholar 

  3. Dasigi, V. and N. Verma: Automatic Generation of Category Profiles and their Evaluation through Text Classification, Proc.2nd International Conference on Intelligent Technologies, November, 2001, pp. 421–427.

    Google Scholar 

  4. Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer and Richard Harshman, "Indexing by latent semantic analysis", in Journal of the American Society for Information Science, 41(6), 391–407, 1990.

    Article  Google Scholar 

  5. Sebestiani, Fabrizio. Attardi, Guiseppe, “Theseus: Categorization by context”, Giuseppe Attardi Dipartimento di Informatica Universit di Pisa, Italy...(1999).

    Google Scholar 

  6. Fuhr, Norbert, Stephen Hartman, Gerhard Lustig, Michael Schwanter, Konstadinos Tzeres and Gerhard Knorz, "Air/X— a rule based multistage indexing system for large subject fields", In RIAO 91 Conference Proceedings: Intelligent Text and Image Handling, 606–623, 1991.

    Google Scholar 

  7. Lewis, David D., “Representation and Learning in Information Retrieval” Ph.D. thesis, Department of Computer Science; University of Massachusetts; Amherst, MA, 1992.

    Google Scholar 

  8. Lewis, David D., “An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task”, Fifteenth Annual International Association for Computing Machinery SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, 1992, 37–50.

    Google Scholar 

  9. Ittner, D.D., Lewis, D.D., Ahn, D., “Text categorization of low quality images”. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, 1995, 301–315.

    Google Scholar 

  10. Moens, M.-F. and Dumortier, J., Automatic Categorization of Magazine Articles, Katholieke Universiteit Leuven, BelgiumInterdisciplinary Centre for Law & IT (ICRI).

    Google Scholar 

  11. Riloff, E., W. Lehnert, "Information Extraction as a Basis for High-Precision Text Classification," ACM Transactions on Information Systems, 12 (3), 1994, 296–333.

    Article  Google Scholar 

  12. Rosch, E., "Principles of Categorization," in Cognition and Categorization, E. Rosch, B. B. Lloyd (Eds.), (Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1978), 27–48.

    Google Scholar 

  13. Smith, E.E., "Categorization," in An invitation to Cognitive Science, Vol. 3, Thinking, D. N. Osherson, E. E. Smith (Eds), The MIT Press, 1990, 33–53.

    Google Scholar 

  14. Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University, 1999

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kongovi, M., Guzman, J.C., Dasigi, V. (2002). Text Categorization: An Experiment Using Phrases. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-45886-7_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43343-9

  • Online ISBN: 978-3-540-45886-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics