Abstract
This paper addresses an important problem related to the use of semantics in IR. It concerns the representation of document semantics and its proper use in retrieval. The approach we propose aims at representing the content of the document by the best semantic network called document semantic core in two main steps. During the first step concepts (words and phrases) are extracted from a document, driven by an external general-purpose ontology, namely WordNet. The second step a global disambiguation of the extracted concepts regarding to the document leads to build the best semantic network. Thus, the selected concepts represent the nodes of the semantic network whereas similarity measure values between connected nodes weight the links. The resulting scored concepts are used for the document conceptual indexing in Information Retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Krovetz, R., Croft, W.B.: Lexical ambiguity and information retrieval. ACM Transactions on Information Systems 10(2), 115–141 (1992)
Khan, L., Luo, F.: Ontology Construction for Information Selection. In: Proc. of 14th IEEE International Conference on Tools with Artificial Intelligence, Washington DC, November 2002, pp. 122–127 (2002)
Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR & NLP, Hong Kong (October 2000)
Baziz, M., Boughanem, M., Aussenac-Gilles, N., Chrisment, C.: Semantic Cores for Representing Documents in IR. In: Proceeding of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, USA, March 2005, vol. 2, pp. 1011–1017 (2005)
Haav, H.M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Proc. of 5th East-European Conference ADBIS 2001, Vilnius "Technika", vol. 2, pp. 29–41 (2001)
Guarino, N., Masolo, C., Vetere, G.: OntoSeek: content-based access to the web. IEEE Intelligent Systems 14, 70–80 (1999)
Voorhees, E.M.: Using WordNet to Disambiguate Word Sense for Text Retrieval. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, pp. 171–180 (1993)
Stokoe, C., Oakes, M.P., Tait, J.: Word sense Disambiguation in Information Retrieval Revisited. In: Proceed. of the 26th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 159–166 (2003)
Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proc. the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)
Sanderson, M.: Retrieving with good senses. Information Retrieval 2(1), 49–69 (2000)
Woods, W.: Conceptual Indexing: A Better Way to Organize Knowledge. Technical report SMLI TR-97-61, Sun Microsystems Laboratories, Mountain view, CA (1997)
Cucchiarelli, N.R., Neri, F., Velardi, P.: Extending and Enriching WordNet with OntoLearn. In: Proc. of The Second Global Wordnet Conference 2004 (GWC 2004), Brno, Czech Republic, January 20-23 (2004)
Croft, W.B., Turtle, H.R., Lewis, D.D.: The Use of Phrases and Structured Queries in Information Retrieval. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the 4th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, pp. 32–45 (1991)
Huang, X., Robertson, S.E.: Comparisons of Probabilistic Compound Unit Weighting Methods. In: Proc. of the ICDM 2001 Workshop on Text Mining, San Jose, USA (November 2001)
Budanitsky, A.: Lexical Semantic Relatedness and its Application in Natural Language Pro-cessing, technical report CSRG-390, Department of Computer Science, University of Toronto (August 1999)
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics CICLING, Mexico City (2003)
Rennie, J.: WordNet: QueryData: a Perl module for accessing the WordNet database (2003), http://people.csail.mit.edu/~jrennie/WordNet
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum 1998, pp. 265–283 (1998)
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th Intern. Joint Conference on Artificial Intelligence (IJCAI) (1995)
Lin, D.: An information theoretic definition of similarity. In: Proceedings of the 15 th International Conference on Machine Learning, Madison, WI (1998)
Jiang, J.J., Conrath, D.W.: Semantic simi-larity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)
Lesk, M.E.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a nice cream cone. In: Proceedings of the SIGDOC Conference, Toronto (1986)
Boughanem, M., Dkaki, T., Mothe, J., Soulé-Dupuy, C.: Mercure at TREC-7. In: Proceeding of Trec-7 (1998)
Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)
Buitelaar, P., Steffen, D., Volk, M., Widdows, D., Sacaleanu, B., Vintar, S., Peters, S., Uszkoreit, H.: Evaluation Resources for Concept-based Cross-Lingual IR in the Medical Domain. In: Proc. of LREC 2004, Lissabon, Portugal (May 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baziz, M., Boughanem, M., Aussenac-Gilles, N. (2005). Conceptual Indexing Based on Document Content Representation. In: Crestani, F., Ruthven, I. (eds) Context: Nature, Impact, and Role. CoLIS 2005. Lecture Notes in Computer Science, vol 3507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11495222_14
Download citation
DOI: https://doi.org/10.1007/11495222_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26178-0
Online ISBN: 978-3-540-32101-9
eBook Packages: Computer ScienceComputer Science (R0)