Text Categorization: An Experiment Using Phrases

Kongovi, Madhusudhan; Guzman, Juan Carlos; Dasigi, Venu

doi:10.1007/3-540-45886-7_15

Madhusudhan Kongovi⁷,
Juan Carlos Guzman⁷ &
Venu Dasigi⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

European Conference on Information Retrieval

482 Accesses

Abstract

Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improved Document Categorization Through Feature-Rich Combinations

Assessing Intelligence Text Classification Techniques

Text categorization based on a new classification by thresholds

Article 03 June 2021

References

Cañas, A.J., F. R. Safayeni, D. W. Conrath, A Conceptual Model and Experiments on How People Classify and Retrieve Documents. Department of Management Sciences, University of Waterloo, Ontario, Canada, 1985.
Google Scholar
Dasigi. V, Mann C. Reinhold, Protopopescu A. Vladimir, “Information fusion for text classification-an experimental comparison”, in The Journal of The Pattern Recognition Society, 34(Sept 2001) 2413–2425.
Google Scholar
Dasigi, V. and N. Verma: Automatic Generation of Category Profiles and their Evaluation through Text Classification, Proc.2nd International Conference on Intelligent Technologies, November, 2001, pp. 421–427.
Google Scholar
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer and Richard Harshman, "Indexing by latent semantic analysis", in Journal of the American Society for Information Science, 41(6), 391–407, 1990.
Article Google Scholar
Sebestiani, Fabrizio. Attardi, Guiseppe, “Theseus: Categorization by context”, Giuseppe Attardi Dipartimento di Informatica Universit di Pisa, Italy...(1999).
Google Scholar
Fuhr, Norbert, Stephen Hartman, Gerhard Lustig, Michael Schwanter, Konstadinos Tzeres and Gerhard Knorz, "Air/X— a rule based multistage indexing system for large subject fields", In RIAO 91 Conference Proceedings: Intelligent Text and Image Handling, 606–623, 1991.
Google Scholar
Lewis, David D., “Representation and Learning in Information Retrieval” Ph.D. thesis, Department of Computer Science; University of Massachusetts; Amherst, MA, 1992.
Google Scholar
Lewis, David D., “An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task”, Fifteenth Annual International Association for Computing Machinery SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, 1992, 37–50.
Google Scholar
Ittner, D.D., Lewis, D.D., Ahn, D., “Text categorization of low quality images”. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, 1995, 301–315.
Google Scholar
Moens, M.-F. and Dumortier, J., Automatic Categorization of Magazine Articles, Katholieke Universiteit Leuven, BelgiumInterdisciplinary Centre for Law & IT (ICRI).
Google Scholar
Riloff, E., W. Lehnert, "Information Extraction as a Basis for High-Precision Text Classification," ACM Transactions on Information Systems, 12 (3), 1994, 296–333.
Article Google Scholar
Rosch, E., "Principles of Categorization," in Cognition and Categorization, E. Rosch, B. B. Lloyd (Eds.), (Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1978), 27–48.
Google Scholar
Smith, E.E., "Categorization," in An invitation to Cognitive Science, Vol. 3, Thinking, D. N. Osherson, E. E. Smith (Eds), The MIT Press, 1990, 33–53.
Google Scholar
Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University, 1999
Google Scholar

Download references

Author information

Authors and Affiliations

Southern Polytechnic State University, 1100 S. Marietta Parkway, 30060, Marietta, GA
Madhusudhan Kongovi, Juan Carlos Guzman & Venu Dasigi

Authors

Madhusudhan Kongovi
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Guzman
View author publications
You can also search for this author in PubMed Google Scholar
Venu Dasigi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Sciences, University of Strathclyde, 26 Richmond Street, G1 1XH, Glasgow, UK
Fabio Crestani
School of Information and Communication Technologies, University of Paisley, High Street, PA1 2BE, Paisley, UK
Mark Girolami
Computing Science Department, University of Glasgow, 17 Lilybank Gardens, G12 8RZ, Glasgow, UK
Cornelis Joost van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kongovi, M., Guzman, J.C., Dasigi, V. (2002). Text Categorization: An Experiment Using Phrases. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_15

Download citation

DOI: https://doi.org/10.1007/3-540-45886-7_15
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43343-9
Online ISBN: 978-3-540-45886-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Text Categorization: An Experiment Using Phrases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Document Categorization Through Feature-Rich Combinations

Assessing Intelligence Text Classification Techniques

Text categorization based on a new classification by thresholds

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Text Categorization: An Experiment Using Phrases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Document Categorization Through Feature-Rich Combinations

Assessing Intelligence Text Classification Techniques

Text categorization based on a new classification by thresholds

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation