Skip to main content

Query or Document Translation for Academic Search – What’s the Real Difference?

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2020)

Abstract

We compare query and document translation from and to English, French, German and Spanish for multilingual retrieval in an academic search portal: PubPsych. Both translation approaches improve the retrieval performance of the system with document translation providing better results. Performance inversely correlates with the amount of available original language documents. The more documents already available in a language, the fewer improvements can be observed. Retrieval performance with English as a source language does not improve with translation as most documents already contained English-language content in our text collection. The large-scale evaluation study is based on a corpus of more than 1M metadata documents and 50 real queries taken from the query log files of the portal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.pubpsych.eu.

  2. 2.

    https://www.psyndex.de.

  3. 3.

    https://www.nlm.nih.gov/pubs/factsheets/medline.html.

  4. 4.

    A reviewer of this paper pointed out that recall-oriented searches for systematic reviews are another important use case for academic search portals. This use case was not addressed in this study.

  5. 5.

    This dataset is available at https://github.com/clubs-project/documentation/.

  6. 6.

    https://github.com/alueschow/clubs-compa.

References

  1. Ammon, U.: Global scientific communication: open questions and policy suggestions. AILA Rev. 20, 123–133 (2007)

    Article  Google Scholar 

  2. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005

    Google Scholar 

  3. Bernardi, R., et al.: Multilingual search in libraries. The case-study of the Free University of Bozen-Bolzano. In: LREC, pp. 2287–2290 (2006)

    Google Scholar 

  4. Biswas, S.C.: Multilingual access to information in a networked environment character encoding & unicode standard. In: INFLIBNET 3rd Convention Planner, Assam University, Silchar, 10–11 November 2005, pp. 176–186. INFLIBNET Centre (2005). http://hdl.handle.net/1944/1391

  5. Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Am. Soc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)

    Article  Google Scholar 

  6. Braschler, M., Scháuble, P.: Experiments with the eurospider retrieval system for CLEF 2000. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 140–148. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44645-1_13

    Chapter  MATH  Google Scholar 

  7. Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)

    Article  Google Scholar 

  8. Chen, A., Gey, F.C.: Combining query translation and document translation in cross-language retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 108–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30222-3_10

    Chapter  Google Scholar 

  9. Clough, P., Sanderson, M.: User experiments with the eurovision cross-language image retrieval system. J. Am. Soc. Inform. Sci. Technol. 57(5), 697–708 (2006)

    Article  Google Scholar 

  10. Di Bitetti, M.S., Ferreras, J.A.: Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46(1), 121–127 (2017)

    Article  Google Scholar 

  11. Diekema, A.R.: Multilinguality in the digital library: a review. Electron. Libr. 30(2), 165–181 (2012). https://doi.org/10.1108/02640471211221313

    Article  Google Scholar 

  12. España-Bonet, C., Ramthun, R.: M3.1—Cross-lingual thesaurus and controlled term translation. Technical report, CLUBS-Project, March 2018. https://doi.org/10.23668/psycharchives.2746

  13. España-Bonet, C., Stiller, J., Henning, S.: M1.2—Corpora for the machine translation engines. Technical report, CLUBS-Project, July 2018. https://doi.org/10.23668/psycharchives.2746

  14. España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V.: Query translation for cross-lingual search in the academic search engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 37–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14401-2_4

    Chapter  Google Scholar 

  15. España-Bonet, C., Henning, S., Ramthun, R., Stiller, J., van Genabith, J.: MT models for multilingual CLuBS engine (en-de-fr-es), March 2020. https://doi.org/10.5281/zenodo.3709164

  16. Henrich, J., Heine, S.J., Norenzayan, A.: Most people are not WEIRD. Nature 466, 29 (2010)

    Article  Google Scholar 

  17. Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017). https://doi.org/10.1162/tacl_a_00065. https://www.aclweb.org/anthology/Q17-1024

  18. Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, pp. 116–121. Association for Computational Linguistics, Melbourne, Australia, July 2018. http://www.aclweb.org/anthology/P18-4020

  19. Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: JCDL 2016, pp. 111–114. ACM (2016)

    Google Scholar 

  20. Király, P.: Query translation in Europeana. Code4Lib J. 27 (2015)

    Google Scholar 

  21. Kornadt, H.J., Trommsdorff, G., Kobayashi, R.B.: “Mein Hund hat mich bestorben”: sprachlicher Ausdruck von Gefühlen im deutsch-japanischen Vergleich. In: Kornadt, H.J. (ed.) Sprache und Kognition: Perspektiven moderner Sprachpsychologie, pp. 233–250. Spektrum Akad. Verl., Heidelberg (1994)

    Google Scholar 

  22. Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)

    Article  Google Scholar 

  23. McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, USA, pp. 208–299 (1999). https://doi.org/10.3115/1034678.1034716

  24. Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 109–119 (2012)

    Google Scholar 

  25. Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Librariansh. 42(5), 495–502 (2016)

    Article  Google Scholar 

  26. Oard, D.W.: Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag. (1997)

    Google Scholar 

  27. Oard, D.W.: A comparative study of query and document translation for cross-language information retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 472–483. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49478-2_42

    Chapter  Google Scholar 

  28. Oard, D.W., Hackett, P.G.: Document translation for cross-language text retrieval at the University of Maryland. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 687–696 (1997)

    Google Scholar 

  29. Palotti, J.A., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Retrieval 19(1–2), 189–224 (2016)

    Article  Google Scholar 

  30. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the Association of Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  31. Peters, C., Braschler, M., Clough, P.: Cross-language information retrieval. In: Peters, C., Braschler, M., Clough, P. (eds.) Multilingual Information Retrieval, pp. 57–84. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23008-0_3

    Chapter  Google Scholar 

  32. Petras, V., Perelman, N., Gey, F.: UC Berkeley at CLEF-2003 – Russian language experiments and domain-specific retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 401–411. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30222-3_39

    Chapter  Google Scholar 

  33. Sanderson, M., et al.: Test collection based evaluation of information retrieval systems. Found. Trends® Inform. Retrieval 4(4), 247–375 (2010)

    Article  Google Scholar 

  34. Savoy, J., Braschler, M.: Lessons learnt from experiments on the ad hoc multilingual test collections at CLEF. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 177–200. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_7

    Chapter  Google Scholar 

  35. Schuers, M., et al.: Lost in translation? A multilingual query builder improves the quality of pubmed queries: a randomised controlled trial. BMC Med. Inform. Decis. Mak. 17(1), 94 (2017)

    Article  Google Scholar 

  36. Türe, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599 (2014)

    Google Scholar 

  37. Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19–26 (2009)

    Google Scholar 

  38. Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)

    Article  Google Scholar 

  39. Vassilakaki, E., Garoufallou, E., Johnson, F., Hartley, R.J.: An exploration of users’ needs for multilingual information retrieval and access. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 249–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24129-6_22

    Chapter  Google Scholar 

  40. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. pp. 5998–6008. Curran Associates, Inc. (2017)

    Google Scholar 

  41. Waeldin, S.: Results from the PubPsych launch survey: short report. ZPID Sci. Inf. Online 15(2), 3 (2015). https://www.zpid.de/pub/research/2015_Waeldin_PubPsych-launch.pdf

  42. Weichselgartner, E., Baier, C., Ramthun, R.: Pubpsych: a powerful research tool providing access to a broad supranational body of psychological knowledge. Datenbank-Spektrum 17(1), 35–39 (2017)

    Article  Google Scholar 

  43. Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Technol. 57(9), 1208–1220 (2006)

    Article  Google Scholar 

  44. Zhang, Y.: Improved cross-language information retrieval via disambiguation and vocabulary discovery. Ph.D. thesis, School of Computer Science and Information Technology RMIT University, Melbourne, Victoria, Australia (2006)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the Leibniz-Gemeinschaft under grant SAW-2016-ZPID-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivien Petras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petras, V., Lüschow, A., Ramthun, R., Stiller, J., España-Bonet, C., Henning, S. (2020). Query or Document Translation for Academic Search – What’s the Real Difference?. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2020. Lecture Notes in Computer Science(), vol 12260. Springer, Cham. https://doi.org/10.1007/978-3-030-58219-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58219-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58218-0

  • Online ISBN: 978-3-030-58219-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics