Concept Indexing for Automated Text Categorization

Gómez, José María; Cortizo, José Carlos; Puertas, Enrique; Ruiz, Miguel

doi:10.1007/978-3-540-27779-8_17

Concept Indexing for Automated Text Categorization

José María Gómez¹⁸,
José Carlos Cortizo¹⁹,
Enrique Puertas¹⁸ &
…
Miguel Ruiz¹⁹

Conference paper

700 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

In this paper we explore the potential of concept indexing with WordNet synsets for Text Categorization, in comparison with the traditional bag of words text representation model. We have performed a series of experiments in which we also test the possibility of using simple yet robust disambiguation methods for concept indexing, and the effectiveness of stoplist-filtering and stemming on the SemCor semantic concordance. Results are not conclusive yet promising.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading (1989)
Google Scholar
Caropreso, M., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Text Databases and Document Management: Theory and Practice, pp. 78–102. Idea Group Publishing, USA (2001)
Google Scholar
Lewis, D.D.: Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US (1992)
Google Scholar
Riloff, E.: Using learned extraction patterns for text classification. In: Connectionist, statistical, and symbolic approaches to learning for natural language processing, pp. 275–289. Springer, Heidelberg (1996)
Google Scholar
Fukumoto, F., Suzuki, Y.: Learning lexical representation for text categorization. In: Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources (2001)
Google Scholar
Scott, S.: Feature engineering for a symbolic approach to text classification. Master’s thesis, Computer Science Dept., University of Ottawa, Ottawa, CA (1998)
Google Scholar
Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems (1998)
Google Scholar
Junker, M., Abecker, A.: Exploiting thesaurus knowledge in rule induction for text classification. In: Proceedings of the, 2nd International Conference on Recent Advances in Natural Language Processing, pp. 202–207 (1997)
Google Scholar
Liu, J., Chua, T.: Building semantic perceptron net for topic spotting. In: Proceedings of 37th Meeting of Association of Computational Linguistics (2001)
Google Scholar
Petridis, V., Kaburlasos, V., Fragkou, P., Kehagias, A.: Text classification using the σ-FLNMAP neural network. In: Proceedings of the 2001 International Joint Conference on Neural Networks (2001)
Google Scholar
Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 161–175 (1994)
Google Scholar
Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)
Article Google Scholar
Voorhees, E.: Using WordNet for text retrieval. In: WordNet: An Electronic Lexical Database, MIT Press, Cambridge (1998)
Google Scholar
Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR and NLP (2000)
Google Scholar
Stokoe, C., Oakes, M.P., Tait, J.: Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (2003)
Google Scholar
Kilgarriff, A., Rosenzweig, J.: Framework and results for english SENSEVAL. Computers and the Humanities 34, 15–48 (2000)
Article Google Scholar
Miller, G.A., Leacock, C., Tengi, R., Bunker, R.: A semantic concordance. In: Proc. Of the ARPA Human Language Technology Workshop, pp. 303–308 (1993)
Google Scholar
Kessler, B., Nunberg, G., Schütze, H.: Automatic detection of text genre. In: Proceedings of ACL 1997, 35th Annual Meeting of the Association for Computational Linguistics, Madrid, ES, pp. 32–38 (1997)
Google Scholar
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. Of the 14th International Conf. On Machine Learning (1997)
Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, Morgan Kaufmann Publishers, San Francisco (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain
José María Gómez & Enrique Puertas
AINet Solutions, 28943, Fuenlabrada, Madrid, Spain
José Carlos Cortizo & Miguel Ruiz

Authors

José María Gómez
View author publications
You can also search for this author in PubMed Google Scholar
José Carlos Cortizo
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Puertas
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Ruiz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Science and Engineering Newton Building, University of Salford, M5 4WT, Greater Manchester, UK
Farid Meziane
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez, J.M., Cortizo, J.C., Puertas, E., Ruiz, M. (2004). Concept Indexing for Automated Text Categorization. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-27779-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics