Abstract
This paper deals with the use of ontologies in Information Retrieval field. It introduces an approach for document content representation by ontology-document matching. The approach consists in concepts (mono and multiword) detection from a document via a general purpose ontology, namely WordNet. Two criterions are then used: co-occurrence for identifying important concepts in a document, and semantic similarity to compute semantic relatedness between these concepts and then to disambiguate them. The result is a set of scored concepts-senses (nodes) with weighted links called semantic core of document which best represents the semantic content of the document. We regard the proposed and evaluated approach as a short but strong step toward the long term goal of Intelligent Indexing and Semantic Retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
OntoQuery project net site, http://www.ontoquery.dk
Khan, L., Luo, F.: Ontology Construction for Information Selectio. In: Proc. of 14th IEEE International Conference on Tools with Artificial Intelligence, Washington DC, November 2002, pp. 122–127 (2002)
Guarino, N., Masolo, C., Vetere, G.: OntoSeek: content-based access to the web. IEEE Intelligent Systems 14, 70–80 (1999)
Baziz, M., Aussenac-Gilles, N., et Boughanem, M.: Désambiguïsation et Expansion de Requêtes dans un SRI: Etude de l’apport des liens sémantiques. In: Hermes, V. (ed.) Revue des Sciences et Technologies de l’Information (RSTI) série ISI, December 2003, vol. 8(4/2003), pp. 113–136 (2003)
Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR & NLP, Hong Kong (October 2000)
Miller, G.: Wordnet: A lexical database. Communication of the ACM 38(11), 39–41 (1995)
Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation 49(2), 188–207 (1993)
Haav, H.M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Proc. of 5th East-European Conference ADBIS*2001, Vilnius Technika, vol. 2., pp. 29–41 (2001)
Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proc. the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)
Zarg Ayouna, H., Salotti, S.: Mesure de similarité dans une ontologie pour l’indexation sémantique de documents XML. In: Dans Ing. des Connais, IC 2004, Lyon Mai, pp. 249–260 (2004)
Cucchiarelli, R., Navigli, F., Neri, P.: Velardi. Extending and Enriching WordNet with OntoLearn. In: Proc. of The Second Global Wordnet Conference 2004 (GWC 2004), Brno, Czech Republic (January 20-23, 2004)
Hirst, G., St. Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, pp. 305–332. MIT Press, Cambridge (1998)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research (JAIR) 11, 95–130 (1999)
Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using Word-Net. In: Proc. of the Third International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (February 2002)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proc. of SIGDOC 1986 (1986)
Croft, W.B., Turtle, H.R., Lewis, D.D.: The Use of Phrases and Structured Queries in Information Retrieval. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, pp. 32–45 (1991)
Huang, X., Robertson, S.E.: Comparisons of Probabilistic Compound Unit Weighting Methods. In: Proc. of the ICDM 2001 Workshop on Text Mining, San Jose, USA (November 2001)
Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proc. of the 2nd International Conference on Language resources and Evaluation, LREC 2000, Atenas (2000)
Boughanem, M., Dkaki, T., Mothe Et, J., SoulÉ-Dupuy, C.: Mercure at TREC-7. In: Proceeding of Trec-7 (1998)
Buitelaar, P., Steffen, D., Volk, M., Widdows, D., Sacaleanu, B., Vintar, S., Peters, S., Uszkoreit, H.: Evaluation Resources for Concept-based Cross-Lingual IR in the Medical Domai. In: Proc. of LREC 2004, Lissabon, Portugal (May 2004)
The Sixth Text REtrieval Conference (TREC{6). Edited by E.M. Voorhees and D.K. Harman. Gaithersburg, MD: NIST (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baziz, M. (2004). Towards a Semantic Representation of Documents by Ontology-Document Mapping. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-30106-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22959-9
Online ISBN: 978-3-540-30106-6
eBook Packages: Springer Book Archive