Abstract
The paper presents results and conclusions of an ad hoc evaluation lab concerning information retrieval for Polish. A corpus of ca. million document descriptions of Polish Europeana resources was indexed and matched against a set of fifty test queries. Different pre-processing procedures as well as different indexing and term weighting approaches were used and evaluated. Efficiency of different IR models was compared. Finally human-based relevance assessment was provided for retrieved documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amati, G., van Rijsbergen, C.J.: Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems 20, 357–389 (2002)
Buckley, C., Voorhees, E.M.: Retrieval Systems Evaluation. In: Vorhees, H. (ed.) TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 53–75. MIT Press, Cambridge (2005)
CHiC: Cultural Heritage in CLEF (2013), http://www.promise-noe.eu/chic-2013/home (access date: March 25, 2014) (retrieved)
Dolamic, L., Savoy, J.: Indexing and Stemming Approaches for the Czech Language. Information Processing & Management 45, 714–720 (2009)
Fautsch, C., Savoy, J.: Algorithmic Stemmers or Morphological Analysis: An Evaluation. JASIST 60, 1616–1624 (2009)
Guidelines for participation and submission, http://www.promise-noe.eu/chic-2013/guidelines-for-participation-and-submission (access date: March 25, 2014) (retrieved)
Harman, D.K.: The TREC Test Collections. In: Vorhees, H. (ed.) TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 21–52. MIT Press, Cambridge (2005)
Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents. In: Proc. of the ACM-CIKM, pp. 625–633 (2004)
Majumder, P., Mitra, M., Parui, S.K., Kole, G.: YASS: Yet Another Suffix Stripper. ACM-Transactions on Information Systems 25, Article #18 (2007)
Malak, P.: The Polish Task within Cultural Heritage in CLEF (CHiC) 2013. Torun Runs. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Evaluation Labs and Workshop Working Notes, Valencia, Spain, September 23-26 (2013), http://www.clef-initiative.eu/documents/71612/b00f7561-fadb-47a8-ab67-74f116ce062a (access date: March 25, 2014) (retrieved)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
McNamee, P., Mayfield, J.: Character n-gram Tokenization for European Language Text Retrieval. IR Journal 7, 73–97 (2004)
Paik, J.H., Mitra, M., Parui, S.K., Jarvelin, K.: GRAS: An Effective and Efficient Stemming Algorithm for Information Retrieval. ACM-Transactions on Information Systems 29, Article #19 (2011)
Paik, J.H., Parui, S.K., Pal, D., Robertson, S.E.: Effective and Robust Query Biased Stemming. ACM-Transactions on Information Systems 31 (2013)
Petras, V., et al.: Cultural Heritage in CLEF (CHiC) 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B., et al. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 192–211. Springer, Heidelberg (2013)
Pl Task Unine: Polish Track at CLEF (2013), http://members.unine.ch/jacques.savoy/Polish/ (access date: March 25, 2014) (retrieved)
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a Way of Life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)
Savoy, J.: Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages. In: Proceedings ACM-SAC, pp. 1031–1035. The ACM Press (2006)
Voorhees, E.M., Harman, D.K.: The Text REtrieval Conference. In: Vorhees, H. (ed.) REC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 3–20. MIT Press, Cambridge (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Akasereh, M., Malak, P., Pawłowski, A. (2014). Evaluation of IR Strategies for Polish. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-10888-9_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)