Skip to main content
Log in

The Multilingual Student Translation corpus: a resource for translation teaching and research

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign language learners or trainee translators collected collaboratively by a large number of partner teams internationally. The corpus represents a prime example of community sourcing, as the data are collected and shared by the members of the MUST network. Two key characteristics of the corpus are that it involves a large number of language pairs and that each text is accompanied by a rich set of standardized metadata related to the source texts, the translation tasks and the students. The web interface on which the corpus is stored allows the data to be aligned and annotated with a purpose-built translation annotation system. The resulting corpus data lend themselves to a range of applications (translator training, materials design, pedagogical lexicography) and can also be used to advance empirical research in corpus-based translation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. https://uclouvain.be/en/research-institutes/ilc/cecl/must.html.

  2. For a list of partners, see https://uclouvain.be/en/research-institutes/ilc/cecl/must-partners.html.

  3. The project-specific interface, Hypal4MUST, is not available outside the MUST project but the generic Hypal interface can be used by researchers outside MUST (see https://hypal.eu).

  4. The initial basis for the MUST genre taxonomy is Lee (2001)’s categorization of the genres included in the British National Corpus, supplemented with genres identified by the MUST community as being relevant to the project.

  5. For this reason, student translations collected outside the MUST project cannot be included in MUST.

References

  • Alfuraih, R. F. (2019). The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics. Language Resources & Evaluation. https://doi.org/10.1007/s10579-019-09472-6.

    Article  Google Scholar 

  • Baker, M. (1993). Corpus linguistics and translation studies. Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and Technology. In Honour of John Sinclair (pp. 233–250). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223–243.

    Article  Google Scholar 

  • Bowker, L., & Bennison, P. (2002). Translation tracking system: A tool for managing translation archives. Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 503–507). Las Palmas, Canary Islands, 29–31 May 2002.

  • Bowker, L., & Bennison, P. (2003). Student translation archive: design, development and application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in Translator Education (pp. 103–117). London & New York: Routledge.

    Google Scholar 

  • Branzov, T. (2016). Community-sourcing in virtual societies. Serdica Journal of Computing, 10(3–4), 263–284.

    Google Scholar 

  • Castagnoli, S. (2009). Regularities and variations in learner translations: A corpus-based study of conjunctive explicitation. Unpublished PhD Thesis. Pisa University.

  • Castagnoli, S., Ciobanu, D., Kunz, K., Kübler, N., & Volanschi, A. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, Language, Teaching, and Resources: From Theory to Practice (pp. 221–248). Bern: Peter Lang.

    Google Scholar 

  • Chesterman, A. (2007). Similarity analysis and the translation profile. Belgian Journal of Linguistics, 21, 53–66.

    Article  Google Scholar 

  • Cosme, C. (2008). Participle clauses in learner English: the role of transfer. In G. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking Up Contrastive and Learner Corpus Research (pp. 177–198). Amsterdam & New York: Rodopi.

    Google Scholar 

  • Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System: An International Journal of Educational Technology and Applied Linguistics., 26(2), 163–174.

    Article  Google Scholar 

  • Díaz-Negrillo, A., & Fernández-Domínguez, J. (2006). Error tagging systems for learner corpora. Revista Española de Lingüística Aplicada, 19, 83–102.

    Google Scholar 

  • Espunya, A. (2014). The UPF learner translation corpus as a resource for translator training. Language Resources and Evaluation, 48, 33–43.

    Article  Google Scholar 

  • Fictumova, J., Obrusník, A., & Stepankova, K. (2017). Teaching specialized translation error-tagged translation learner corpora. Sendebar, 28, 209–241.

    Google Scholar 

  • Florén, C. (2006). ENTRAD, an English Spanish parallel corpus created for the teaching of translation. Paper presented at the 7th Teaching and Language Corpora Conference (TALC 2006).

  • Gaspari, F., & Bernardini, S. (2010). Comparing non-native and translated language: Monolingual comparable corpora with a twist. In R. Xiao (Ed.), Using Corpora in Contrastive and Translation Studies (pp. 215–234). Newcastle: Cambridge Scholars Publishing.

    Google Scholar 

  • Gillard, P., & Gadsby, A. (1998). Using a learners’ corpus in compiling ELT dictionaries. In S. Granger (Ed.), Learner English on Computer (pp. 159–171). London & New York: Addison Wesley Longman.

    Google Scholar 

  • Graedler, A.-L. (2013). NEST—A corpus in the brooding box. Studies in Variation, Contacts and Change in English, 13. http://www.helsinki.fi/varieng/series/volumes/13/graedler/.

  • Granger, S. (1993). The international corpus of learner English. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English Language Corpora: Design, Analysis and Exploitation (pp. 57–69). Amsterdam & Atlanta: Rodopi.

    Google Scholar 

  • Granger, S. (1994). The learner corpus: A revolution in applied linguistics. English Today, 10(3), 25–33.

    Article  Google Scholar 

  • Granger, S. (1996). From CA to CIA and back: An integrated contrastive approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in Contrast Text-based Cross-Linguistic Studies. Lund Studies in English (88th ed., pp. 37–51). Lund: Lund University Press.

    Google Scholar 

  • Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO, 20(3), 465–480.

    Google Scholar 

  • Granger, S., & Lefer, M.-A. (2016). From general to learners’ bilingual dictionaries: Towards a more effective fulfilment of advanced learners’ phraseological needs. International Journal of Lexicography, 29(3), 279–295.

    Article  Google Scholar 

  • Halverson, S. (2017). Gravitational pull in translation testing a revised model. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical Translation Studies: New Methodological and Theoretical Traditions. Trends in Linguistics. Studies and Monographs (pp. 9–45). Berlin: De Gruyter Mouton.

    Google Scholar 

  • Hasselgård, H., & Johansson, S. (2011). Learner corpora and contrastive interlanguage analysis. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A Taste for Corpora. In honour of Sylviane Granger (pp. 33–62). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Johansson, S. (2007). Seeing through Multilingual Corpora On the use of corpora in contrastive studies. Amsterdam and Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Kruger, H. (2018). Expanding the third code: Corpus-based studies of constrained communication and language mediation. In S. Granger, M.-A. Lefer, & L. Penha-Marion (Eds.), Book of Abstracts. Using Corpora in Contrastive and Translation Studies Conference (5th edition). CECL Papers 1 (pp. 9–12). Louvain-la-Neuve: Centre for English Corpus Linguistics/Université catholique de Louvain.

  • Kübler, N. (2008). A comparable Learner Translator Corpus: Creation and use. LREC 2008 Workshop on Comparable Corpora, (pp 73–78).

  • Kutuzov, A., & Kunilovskaya, M. (2014). Russian learner translator corpus: design, research potential and applications. In P. Sojka, A. Horák, I. Kopeček, & K. Palak (Eds.), Text, Speech and Dialogue. Lecture Notes in Computer Science (pp. 315–323). Berlin: Springer.

    Chapter  Google Scholar 

  • Lanstyák, I., & Heltai, P. (2012). Universals in language contact and translation. Across Languages and Cultures, 13(1), 99–121.

    Article  Google Scholar 

  • Lapshinova-Koltunski, E. (2013). VARTRA: A comparable corpus for analysis of translation variation. Proceedings of the 6th Workshop on Building and Using Comparable Corpora (pp. 77–86). Sofia, Bulgaria, 8 August 2013.

  • Laviosa, S. (1998). The English comparable corpus: A resource and a methodology. In L. Bowker, M. Cronin, D. Kenny, & J. Pearson (Eds.), Unity in Diversity? Current Trends in Translation Studies. Manchester: St. Jerome Publishing.

    Google Scholar 

  • Lee, D. Y. W. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology, 5(3), 37–72.

    Google Scholar 

  • Lefer, M.-A. (forthcoming). Parallel corpora. In M. Paquot, & S. Th. Gries (Eds), Practical Handbook of Corpus Linguistics. Berlin: Springer.

  • Lüdeling, A., & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 135–157). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Macken, L., De Clercq, O., & Paulussen, H. (2011). Dutch Parallel Corpus: A Balanced Copyright-cleared Parallel Corpus. Meta, 56(2), 374–390.

    Article  Google Scholar 

  • Maingay, S., & Rundell, M. (1987). Anticipating learners’ errors—implications for dictionary writers. In A. P. Cowie (Ed.), The Dictionary and the Language Learner (pp. 128–135). Tübingen: Niemeyer.

    Google Scholar 

  • Obrusník, A. (2013). A hybrid approach to parallel text alignment. Bachelor thesis. Masaryk University.

  • Obrusník, A. (2014). Hypal: A User-Friendly Tool for Automatic Parallel Text Alignment and Error Tagging. Eleventh International Conference Teaching and Language Corpora (pp. 67–69), Lancaster, 20–23 July 2014.

  • Štěpánková, K. (2014). Learner Translation Corpus: CELTraC (Czech-English Learner Translation Corpus). Bachelor’s Diploma Thesis. Masaryk University.

  • Uzar, R. S. (2002). A corpus methodology for analysing translation. In S.E.O. Tagnin (Ed.), Cadernos de Tradução: Corpora e Tradução (pp. 235–263). Florianópolis: NUT, 1(9).

  • Uzar, R., & Waliński, J. (2001). Analysing the fluency of translators. International Journal of Corpus Linguistics, 6, 155–166.

    Article  Google Scholar 

  • Wible, D., Kuo, C.-H., Chien, F.-Y., Liu, A., & Tsao, N.-L. (2001). A Web-based EFL writing environment: Integrating information for learners, teachers, and researchers. Computers & Education, 37, 297–315.

    Article  Google Scholar 

  • Wurm, A. (2016). Presentation of the KOPTE Corpus and Research Project. https://www.academia.edu/24012369/Presentation_of_the_KOPTE_Corpus_and_Research_Project.

Download references

Acknowledgements

We would like to thank the MUST local coordinators—Silvia Bernardini, Łucja Biel, Mario Cal Varela, Cem Can, Sara Castagnoli, Madalina Chitez, Elisa Corino, Julie Deconinck, Gert De Sutter, Margherita Dore, Gaetano Falco, Jonė Grigaliūnienė, Sandra Louise Halverson, Ruska Ivanovska-Naskova, Marlen Izquierdo, Xu Jiajin, Gurgen Karapetyan, Natalie Kübler, Efi Lamprou, Magnus Levin, Adriana Mezeg, Christine Michaux, Marina Morbiducci, Adriane Orenha Ottaiano, Adriana Orlandi, Heloísa Orsi Koch Delgado, Jun Pan, Anastasia Parianou, Gill Philip, Éric Poirier, Juan Pedro Rica Peromingo, Carola Strobl, Jenny Ström Herold, Olympia Tsaknaki, Jurgita Vaičenonienė, Susana Valdez, Heidi Verplaetse, Andrea Wurm—for contributing their translation data to the MUST project as well as for their helpful and enthusiastic support.

We would also like to thank the two anonymous reviewers for their helpful suggestions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylviane Granger.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Granger, S., Lefer, MA. The Multilingual Student Translation corpus: a resource for translation teaching and research. Lang Resources & Evaluation 54, 1183–1199 (2020). https://doi.org/10.1007/s10579-020-09485-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-020-09485-6

Keywords

Navigation