Skip to main content

An Extended CLEF eHealth Test Collection for Cross-Lingual Information Retrieval in the Medical Domain

  • Conference paper
  • First Online:
Book cover Advances in Information Retrieval (ECIR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11438))

Included in the following conference series:

Abstract

We present a test collection for medical cross-lingual information retrieval. It is built on resources used by the CLEF eHealth Evaluation Lab 2013–2015 in the patient-centered information retrieval tasks and improves applicability and reusability of the official data. The document set is identical to the official one used for the task in 2015 and contains about one million English medical webpages. The query set contains 166 items used during the three years of the campaign as test queries, now available in eight languages. The extended test collection provides additional relevance judgements which almost doubled the amount of the officially assessed query-document pairs. This paper describes the content of the extended collection, details of query translation and relevance assessment, and state-of-the-art results obtained on this collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://hdl.handle.net/11234/1-2925.

  2. 2.

    http://trec.nist.gov.

  3. 3.

    http://ntcir.nii.ac.jp.

  4. 4.

    http://www.clef-initiative.eu/.

  5. 5.

    https://sites.google.com/site/clefehealth/.

  6. 6.

    http://lemurproject.org/clueweb12/specs.php.

References

  1. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37–46 (1960)

    Article  Google Scholar 

  2. Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2007: ad hoc track overview. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 13–32. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_2

    Chapter  Google Scholar 

  3. Dušek, O., Hajič, J., Hlaváčová, J., Novák, M., Pecina, P., Rosa, R., et al.: Machine translation of medical texts in the Khresmoi project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 221–228. ACL, Baltimore (2014)

    Google Scholar 

  4. Gey, F.C., Chen, A.: TREC-9 cross-language information retrieval (English-Chinese) overview. In: Proceedings of the Ninth Text REtrieval Conference (TREC-9), pp. 15–23. NIST, Gaithersburg (2000)

    Google Scholar 

  5. Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: searching Arabic using English, French or Arabic queries. In: The Tenth Text REtrieval Conference (TREC 2001), pp. 16–26. NIST, Gaithersburg (2001)

    Google Scholar 

  6. Goeuriot, L., et al.: ShARe/CLEF eHealth evaluation lab 2013, task 3: information retrieval to address patients’ questions when reading clinical reports. CLEF 2013 Online Working Notes 8138, pp. 1–16 (2013)

    Google Scholar 

  7. Goeuriot, L., et al.: ShARe/CLEF eHealth evaluation lab 2014, task 3: user-centred health information retrieval. In: CLEF Online Working Notes. CEUR Workshop Proceedings, vol. 1180, pp. 43–61. CEUR-WS, Sheffield (2014). http://ceur-ws.org/Vol-1180/. ISSN: 1613-0073

  8. Suominen, H., et al.: Overview of the CLEF 2018 consumer health search task. In: CLEF 2018 Evaluation Labs and Workshop: Online Working Notes, pp. 1–15. CEUR-WS, Avignon (2018)

    Google Scholar 

  9. Kando, N.: NTCIR Workshop: Japanese- and Chinese-English cross-lingual information retrieval and multi-grade relevance judgments. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 24–35. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44645-1_3

    Chapter  MATH  Google Scholar 

  10. Kelly, L., Goeuriot, L., Suominen, H., Névéol, A., Palotti, J., Zuccon, G.: Overview of the CLEF eHealth evaluation lab 2016. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 255–266. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_24

    Chapter  Google Scholar 

  11. Koopman, B., Zuccon, G.: Relevation!: an open source system for information retrieval relevance assessment. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1243–1244. ACM, Gold Coast (2014)

    Google Scholar 

  12. Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: LETOR: benchmark dataset for research on learning to rank for information retrieval. In: Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pp. 3–10. ACM, New York (2007)

    Google Scholar 

  13. Majumder, P., Pal, D., Bandyopadhyay, A., Mitra, M.: Overview of FIRE 2010. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds.) FIRE 2010-2011. LNCS, vol. 7536, pp. 252–257. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40087-2_24

    Chapter  Google Scholar 

  14. Oard, D.W., Gey, F.C.: The TREC 2002 Arabic/English CLIR track. In: The Eleventh Text Retrieval Conference (TREC 2002), pp. 1–15. NIST, Gaithersburg (2002)

    Google Scholar 

  15. Palotti, J., Zuccon, G., Jimmy, P.P., Lupu, M., Goeuriot, L., Kelly, L., Hanbury, A.: CLEF 2017 task overview: the IR task at the eHealth evaluation lab. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings, pp. 1–10. CEUR-WS, Dublin (2017)

    Google Scholar 

  16. Palotti, J.R.M., et al.: CLEF eHealth evaluation lab 2015, task 2: retrieving information about medical symptoms. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, vol. 1391, pp. 1–22. CEUR-WS, Toulouse (2015)

    Google Scholar 

  17. Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_86

    Chapter  Google Scholar 

  18. Saleh, S., Pecina, P.: Reranking hypotheses of machine-translated queries for cross-lingual information retrieval. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 54–66. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_5

    Chapter  Google Scholar 

  19. Urešová, Z., Hajič, J., Pecina, P., Dušek, O.: Multilingual test sets for machine translation of search queries for cross-lingual information retrieval in the medical domain. In: Proceedings of LREC 2014, pp. 3244–3247. ERLA, Reykjavik (2014)

    Google Scholar 

  20. Voorhees, E.M., Harman, D.: Overview of the seventh text retrieval conference TREC-7. In: Proceedings of the Seventh Text REtrieval Conference (TREC-7), pp. 1–24. NIST, Gaithersburg (1998)

    Google Scholar 

  21. Voorhees, E.M., Harman, D.: Overview of the eighth text retrieval conference (TREC-8). In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 1–24. NIST, Gaithersburg (2000)

    Google Scholar 

  22. Voorhees, E.M., Harman, D.: Overview of the sixth text retrieval conference (TREC-6). Inf. Process. Manage. 36, 3–35 (2000)

    Article  Google Scholar 

  23. Zhang, Z., Wang, Q., Si, L., Gao, J.: Learning for efficient supervised query expansion via two-stage feature selection. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 265–274. ACM, New York (2016)

    Google Scholar 

Download references

Acknowledgments

The language resources presented in this paper are distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic. This work was supported by the Czech Science Foundation (grant n. P103/12/G084).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shadi Saleh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saleh, S., Pecina, P. (2019). An Extended CLEF eHealth Test Collection for Cross-Lingual Information Retrieval in the Medical Domain. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15719-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15718-0

  • Online ISBN: 978-3-030-15719-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics