Abstract
Language ontologies provide an avenue for automated lexical analysis that may be used to supplement existing information retrieval methods. This paper presents a method of information retrieval that takes advantage of WordNet, a lexical database, to generate paths of abstraction, and uses them as the basis for an inverted index structure to be used in the retrieval of documents from an indexed corpus. We present this method as a entree to a line of research on using ontologies to perform word-sense disambiguation and improve the precision of existing information retrieval techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the 2nd International Conference on Information and Knowledge Management, pp. 67–74 (1993)
Dictionary.com, http://www.dictionary.com
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (1997)
Pedersen, T., Banerjee, S., Padwardhan, S.: Maximizing semantic relatedness to perform word sense disambiguation (February 2009), citeseer.ist.psu.edu/pedersen03maximizing.html
Wan, S., Angryk, R.: Measuring semantic similarity using wordnet-based context vectors. In: Proceedings of the IEEE International Conference on Systems, Man & Cybernetics (2007)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304 (1998)
Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: 19th International conference on Computational Linguistics, pp. 1093–1099 (2002)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Wordnet: a lexical database for the english language, http://wordnet.princeton.edu/
Hossain, M.S., Angryk, R.A.: Gdclust: A graph-based document clustering technique. In: ICDM Workshops, pp. 417–422. IEEE Computer Society, Los Alamitos (2007), http://dblp.uni-trier.de/db/conf/icdm/icdmw2007.html#HossainA07
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. SIGMOD Rec. 29(2), 1–12 (2000), http://dx.doi.org/10.1145/335191.335372
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Publi., Reading (2006)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Home page for 20 newsgroups data set (May 2009), http://people.csail.mit.edu/jrennie/20Newsgroups/
Cohn, I., Gruber, A.: Information retrieval experiments (May 2009), http://www.cs.huji.ac.il/~ido_cohn
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: ACM SIGIR 2000, pp. 208–215. ACM Press, New York (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McAllister, R.A., Angryk, R.A. (2009). An Abstraction-Based Data Model for Information Retrieval. In: Nicholson, A., Li, X. (eds) AI 2009: Advances in Artificial Intelligence. AI 2009. Lecture Notes in Computer Science(), vol 5866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10439-8_57
Download citation
DOI: https://doi.org/10.1007/978-3-642-10439-8_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10438-1
Online ISBN: 978-3-642-10439-8
eBook Packages: Computer ScienceComputer Science (R0)