Skip to main content

Evaluation of IR Strategies for Polish

  • Conference paper
Advances in Natural Language Processing (NLP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

Abstract

The paper presents results and conclusions of an ad hoc evaluation lab concerning information retrieval for Polish. A corpus of ca. million document descriptions of Polish Europeana resources was indexed and matched against a set of fifty test queries. Different pre-processing procedures as well as different indexing and term weighting approaches were used and evaluated. Efficiency of different IR models was compared. Finally human-based relevance assessment was provided for retrieved documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G., van Rijsbergen, C.J.: Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems 20, 357–389 (2002)

    Article  Google Scholar 

  2. Buckley, C., Voorhees, E.M.: Retrieval Systems Evaluation. In: Vorhees, H. (ed.) TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 53–75. MIT Press, Cambridge (2005)

    Google Scholar 

  3. CHiC: Cultural Heritage in CLEF (2013), http://www.promise-noe.eu/chic-2013/home (access date: March 25, 2014) (retrieved)

  4. Dolamic, L., Savoy, J.: Indexing and Stemming Approaches for the Czech Language. Information Processing & Management 45, 714–720 (2009)

    Article  Google Scholar 

  5. Fautsch, C., Savoy, J.: Algorithmic Stemmers or Morphological Analysis: An Evaluation. JASIST 60, 1616–1624 (2009)

    Article  Google Scholar 

  6. Guidelines for participation and submission, http://www.promise-noe.eu/chic-2013/guidelines-for-participation-and-submission (access date: March 25, 2014) (retrieved)

  7. Harman, D.K.: The TREC Test Collections. In: Vorhees, H. (ed.) TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 21–52. MIT Press, Cambridge (2005)

    Google Scholar 

  8. Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents. In: Proc. of the ACM-CIKM, pp. 625–633 (2004)

    Google Scholar 

  9. Majumder, P., Mitra, M., Parui, S.K., Kole, G.: YASS: Yet Another Suffix Stripper. ACM-Transactions on Information Systems 25, Article #18 (2007)

    Google Scholar 

  10. Malak, P.: The Polish Task within Cultural Heritage in CLEF (CHiC) 2013. Torun Runs. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Evaluation Labs and Workshop Working Notes, Valencia, Spain, September 23-26 (2013), http://www.clef-initiative.eu/documents/71612/b00f7561-fadb-47a8-ab67-74f116ce062a (access date: March 25, 2014) (retrieved)

  11. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  12. McNamee, P., Mayfield, J.: Character n-gram Tokenization for European Language Text Retrieval. IR Journal 7, 73–97 (2004)

    Google Scholar 

  13. Paik, J.H., Mitra, M., Parui, S.K., Jarvelin, K.: GRAS: An Effective and Efficient Stemming Algorithm for Information Retrieval. ACM-Transactions on Information Systems 29, Article #19 (2011)

    Google Scholar 

  14. Paik, J.H., Parui, S.K., Pal, D., Robertson, S.E.: Effective and Robust Query Biased Stemming. ACM-Transactions on Information Systems 31 (2013)

    Google Scholar 

  15. Petras, V., et al.: Cultural Heritage in CLEF (CHiC) 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B., et al. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 192–211. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Pl Task Unine: Polish Track at CLEF (2013), http://members.unine.ch/jacques.savoy/Polish/ (access date: March 25, 2014) (retrieved)

  17. Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a Way of Life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)

    Article  Google Scholar 

  18. Savoy, J.: Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages. In: Proceedings ACM-SAC, pp. 1031–1035. The ACM Press (2006)

    Google Scholar 

  19. Voorhees, E.M., Harman, D.K.: The Text REtrieval Conference. In: Vorhees, H. (ed.) REC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 3–20. MIT Press, Cambridge (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Akasereh, M., Malak, P., Pawłowski, A. (2014). Evaluation of IR Strategies for Polish. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10888-9_38

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10887-2

  • Online ISBN: 978-3-319-10888-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics