Abstract
The information retrieval (IR) methods employed for the third participation of the University of Hagen in the domain-specific task of the Cross Language Evaluation Campaign (CLEF 2005) provide a baseline for experiments with natural language processing (NLP) methods in domain-specific IR than methods employed in our previous participations. The baseline consists of a combination of state-of-the-art IR methods with NLP methods for document and query processing.
Our monolingual experiments with German documents combine several methods to achieve better performance, including an entry vocabulary module (EVM), query expansion with semantically related concepts, and a blind feedback technique. The monolingual experiments focus on comparing two techniques for constructing database queries: creating a ‘bag of words’ and creating a semantic network by means of deep linguistic analysis of the query.
For the bilingual experiments, the English topics are translated into German queries with several machine translation (MT) services publicly available. Each set of translated topics is processed separately with the same techniques as in the monolingual experiments. Evaluation results for official experiments with a staged logistic regression and additional experiments with BM25 are presented.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Leveling, J., Helbig, H.: A robust natural language interface for access to bibliographic databases. In: Callaos, N., Margenstern, M., Sanchez, B. (eds.) Proceedings of the 6th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), Orlando, Florida, International Institute of Informatics and Systemics (IIIS), vol. XI, pp. 133–138 (2002)
Leveling, J., Hartrumpf, S.: University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 271–282. Springer, Heidelberg (2005)
Gey, F.C., Buckland, M., Chen, A., Larson, R.R.: Entry vocabulary – a technology to enhance digital search. In: Proc. of the First International Conference on Human Language Technology, San Diego (2001)
Petras, V.: GIRT and the Use of Subject Metadata for Retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 298–309. Springer, Heidelberg (2005)
Leveling, J.: University of Hagen at CLEF 2003: Natural Language Access to the GIRT4 Data. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 412–424. Springer, Heidelberg (2004)
Hartrumpf, S.: Hybrid Disambiguation in Natural Language Analysis. Der Andere Verlag, Osnabrück (2003)
Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Berlin (2006)
Leveling, J.: University of Hagen at CLEF 2005: Towards a better baseline for NLP methods in domain-specific information retrieval. In: Peters, C. (ed.) Results of the CLEF 2005 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2005 Workshop. Centromedia, Wien, Österreich (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leveling, J. (2006). A Baseline for NLP in Domain-Specific IR. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_26
Download citation
DOI: https://doi.org/10.1007/11878773_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45697-1
Online ISBN: 978-3-540-45700-8
eBook Packages: Computer ScienceComputer Science (R0)