Abstract
Information retrieval systems (IRSs) usually suffer from a low ability to recognize a same idea that is expressed in different forms. A way of improving these systems is to take into account morphological variants. We propose here a simple yet effective method to recognize these variants that are further used so as to enrich queries. In comparison with already published methods, our system does not need any external resources or a priori knowledge and thus supports many languages. This new approach is evaluated against several collections, 6 different languages and is compared to existing tools such as a stemmer and a lemmatizer. Reported results show a significant and systematic improvement of the whole IRS efficiency both in terms of precision and recall for every language.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science 42(1), 7–15 (1991)
Kraaij, W., Pohlmann, R.: Viewing Stemming as Recall Enhancement. In: Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval (SIGIR), Zürich, Switzerland, ACM Press, New York (1996)
Hull, D.: Stemming Algorithms - A Case Study for Detailed Evaluation. Journal of the American Society of Information Science 47(1), 70–84 (1996)
Moreau, F., Sébillot, P.: Contributions des techniques du traitement automatique des langues à la recherche d’information. Research report, IRISA, Rennes, France (2005)
Xu, J., Croft, W.B.: Corpus-Based Stemming Using Cooccurrence of Word Variants. ACM Transactions on Information Systems 16(1), 61–81 (1998)
Gaussier, É.: Unsupervised Learning of Derivational Morphology from Inflectional Corpora. In: Proceedings of Workshop on Unsupervised Methods in Natural Language Learning, 37th Annual Meeting of the Association for Computational Linguistics (ACL), College Park, United-States (1999)
Vilares-Ferro, J., Cabrero, D., Alonso, M.A.: Applying Productive Derivational Morphology to Term Indexing of Spanish Texts. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 336–348. Springer, Heidelberg (2001)
Moulinier, I., McCulloh, J.A., Lund, E.: West Group at CLEF 2000: Non-English Monolingual Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, Springer, Heidelberg (2001)
Fuller, M., Zobel, J.: Conflation-Based Comparison of Stemming Algorithms. In: Proceedings of the 3rd Australian Document Computing Symposium, Sydney, Australia (1998)
Goldsmith, J.A., Higgins, D., Soglasnova, S.: Automatic Language-Specific Stemming in Information Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, Springer, Heidelberg (2001)
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice-Hall, Englewood Cliffs (1992)
Savoy, J.: Morphologie et recherche d’information. Technical report, Neuchâtel University, Neuchâtel, Switzerland (2002)
Claveau, V., L’Homme, M.-C.: Structuring Terminology by Analogy Machine Learning. In: Proceedings of the International Conference on Terminology and Knowledge Engineering (TKE), Copenhagen, Denmark (2005)
Hathout, N.: Analogies morpho-synonymiques. Une méthode d’acquisition automatique de liens morphologiques à partir d’un dictionnaire de synonymes. In: Proceedings of 8ème conférence annuelle sur le traitement automatique des langues naturelles (TALN), Tours, France (2001)
Voorhees, E.M.: Query Expansion Using Lexical-Semantic Relations. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR), Dublin, Ireland, ACM Press, New York (1994)
Savoy, J.: A Stemming Procedure and Stopword List for General French Corpora. Journal of the American Society for Information Science 50(10), 944–952 (1999)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(1), 130–137 (1980)
Arampatzis, A., et al.: Linguistically Motivated Information Retrieval. In: Encyclopedia of Library and Information Science, vol. 69, pp. 201–222. Marcel Dekker, New York (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Moreau, F., Claveau, V., Sébillot, P. (2007). Automatic Morphological Query Expansion Using Analogy-Based Machine Learning. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)