Skip to main content

Conceptual Indexing Based on Document Content Representation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3507))

Abstract

This paper addresses an important problem related to the use of semantics in IR. It concerns the representation of document semantics and its proper use in retrieval. The approach we propose aims at representing the content of the document by the best semantic network called document semantic core in two main steps. During the first step concepts (words and phrases) are extracted from a document, driven by an external general-purpose ontology, namely WordNet. The second step a global disambiguation of the extracted concepts regarding to the document leads to build the best semantic network. Thus, the selected concepts represent the nodes of the semantic network whereas similarity measure values between connected nodes weight the links. The resulting scored concepts are used for the document conceptual indexing in Information Retrieval.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Krovetz, R., Croft, W.B.: Lexical ambiguity and information retrieval. ACM Transactions on Information Systems 10(2), 115–141 (1992)

    Article  Google Scholar 

  2. Khan, L., Luo, F.: Ontology Construction for Information Selection. In: Proc. of 14th IEEE International Conference on Tools with Artificial Intelligence, Washington DC, November 2002, pp. 122–127 (2002)

    Google Scholar 

  3. Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR & NLP, Hong Kong (October 2000)

    Google Scholar 

  4. Baziz, M., Boughanem, M., Aussenac-Gilles, N., Chrisment, C.: Semantic Cores for Representing Documents in IR. In: Proceeding of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, USA, March 2005, vol. 2, pp. 1011–1017 (2005)

    Google Scholar 

  5. Haav, H.M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Proc. of 5th East-European Conference ADBIS 2001, Vilnius "Technika", vol. 2, pp. 29–41 (2001)

    Google Scholar 

  6. Guarino, N., Masolo, C., Vetere, G.: OntoSeek: content-based access to the web. IEEE Intelligent Systems 14, 70–80 (1999)

    Google Scholar 

  7. Voorhees, E.M.: Using WordNet to Disambiguate Word Sense for Text Retrieval. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, pp. 171–180 (1993)

    Google Scholar 

  8. Stokoe, C., Oakes, M.P., Tait, J.: Word sense Disambiguation in Information Retrieval Revisited. In: Proceed. of the 26th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 159–166 (2003)

    Google Scholar 

  9. Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proc. the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)

    Google Scholar 

  10. Sanderson, M.: Retrieving with good senses. Information Retrieval 2(1), 49–69 (2000)

    Article  Google Scholar 

  11. Woods, W.: Conceptual Indexing: A Better Way to Organize Knowledge. Technical report SMLI TR-97-61, Sun Microsystems Laboratories, Mountain view, CA (1997)

    Google Scholar 

  12. Cucchiarelli, N.R., Neri, F., Velardi, P.: Extending and Enriching WordNet with OntoLearn. In: Proc. of The Second Global Wordnet Conference 2004 (GWC 2004), Brno, Czech Republic, January 20-23 (2004)

    Google Scholar 

  13. Croft, W.B., Turtle, H.R., Lewis, D.D.: The Use of Phrases and Structured Queries in Information Retrieval. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the 4th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, pp. 32–45 (1991)

    Google Scholar 

  14. Huang, X., Robertson, S.E.: Comparisons of Probabilistic Compound Unit Weighting Methods. In: Proc. of the ICDM 2001 Workshop on Text Mining, San Jose, USA (November 2001)

    Google Scholar 

  15. Budanitsky, A.: Lexical Semantic Relatedness and its Application in Natural Language Pro-cessing, technical report CSRG-390, Department of Computer Science, University of Toronto (August 1999)

    Google Scholar 

  16. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics CICLING, Mexico City (2003)

    Google Scholar 

  17. Rennie, J.: WordNet: QueryData: a Perl module for accessing the WordNet database (2003), http://people.csail.mit.edu/~jrennie/WordNet

  18. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum 1998, pp. 265–283 (1998)

    Google Scholar 

  19. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th Intern. Joint Conference on Artificial Intelligence (IJCAI) (1995)

    Google Scholar 

  20. Lin, D.: An information theoretic definition of similarity. In: Proceedings of the 15 th International Conference on Machine Learning, Madison, WI (1998)

    Google Scholar 

  21. Jiang, J.J., Conrath, D.W.: Semantic simi-larity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)

    Google Scholar 

  22. Lesk, M.E.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a nice cream cone. In: Proceedings of the SIGDOC Conference, Toronto (1986)

    Google Scholar 

  23. Boughanem, M., Dkaki, T., Mothe, J., Soulé-Dupuy, C.: Mercure at TREC-7. In: Proceeding of Trec-7 (1998)

    Google Scholar 

  24. Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  25. Buitelaar, P., Steffen, D., Volk, M., Widdows, D., Sacaleanu, B., Vintar, S., Peters, S., Uszkoreit, H.: Evaluation Resources for Concept-based Cross-Lingual IR in the Medical Domain. In: Proc. of LREC 2004, Lissabon, Portugal (May 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baziz, M., Boughanem, M., Aussenac-Gilles, N. (2005). Conceptual Indexing Based on Document Content Representation. In: Crestani, F., Ruthven, I. (eds) Context: Nature, Impact, and Role. CoLIS 2005. Lecture Notes in Computer Science, vol 3507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11495222_14

Download citation

  • DOI: https://doi.org/10.1007/11495222_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26178-0

  • Online ISBN: 978-3-540-32101-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics