Abstract
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign language learners or trainee translators collected collaboratively by a large number of partner teams internationally. The corpus represents a prime example of community sourcing, as the data are collected and shared by the members of the MUST network. Two key characteristics of the corpus are that it involves a large number of language pairs and that each text is accompanied by a rich set of standardized metadata related to the source texts, the translation tasks and the students. The web interface on which the corpus is stored allows the data to be aligned and annotated with a purpose-built translation annotation system. The resulting corpus data lend themselves to a range of applications (translator training, materials design, pedagogical lexicography) and can also be used to advance empirical research in corpus-based translation studies.
Notes
For a list of partners, see https://uclouvain.be/en/research-institutes/ilc/cecl/must-partners.html.
The project-specific interface, Hypal4MUST, is not available outside the MUST project but the generic Hypal interface can be used by researchers outside MUST (see https://hypal.eu).
The initial basis for the MUST genre taxonomy is Lee (2001)’s categorization of the genres included in the British National Corpus, supplemented with genres identified by the MUST community as being relevant to the project.
For this reason, student translations collected outside the MUST project cannot be included in MUST.
References
Alfuraih, R. F. (2019). The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics. Language Resources & Evaluation. https://doi.org/10.1007/s10579-019-09472-6.
Baker, M. (1993). Corpus linguistics and translation studies. Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and Technology. In Honour of John Sinclair (pp. 233–250). Amsterdam: John Benjamins.
Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223–243.
Bowker, L., & Bennison, P. (2002). Translation tracking system: A tool for managing translation archives. Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 503–507). Las Palmas, Canary Islands, 29–31 May 2002.
Bowker, L., & Bennison, P. (2003). Student translation archive: design, development and application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in Translator Education (pp. 103–117). London & New York: Routledge.
Branzov, T. (2016). Community-sourcing in virtual societies. Serdica Journal of Computing, 10(3–4), 263–284.
Castagnoli, S. (2009). Regularities and variations in learner translations: A corpus-based study of conjunctive explicitation. Unpublished PhD Thesis. Pisa University.
Castagnoli, S., Ciobanu, D., Kunz, K., Kübler, N., & Volanschi, A. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, Language, Teaching, and Resources: From Theory to Practice (pp. 221–248). Bern: Peter Lang.
Chesterman, A. (2007). Similarity analysis and the translation profile. Belgian Journal of Linguistics, 21, 53–66.
Cosme, C. (2008). Participle clauses in learner English: the role of transfer. In G. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking Up Contrastive and Learner Corpus Research (pp. 177–198). Amsterdam & New York: Rodopi.
Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System: An International Journal of Educational Technology and Applied Linguistics., 26(2), 163–174.
Díaz-Negrillo, A., & Fernández-Domínguez, J. (2006). Error tagging systems for learner corpora. Revista Española de Lingüística Aplicada, 19, 83–102.
Espunya, A. (2014). The UPF learner translation corpus as a resource for translator training. Language Resources and Evaluation, 48, 33–43.
Fictumova, J., Obrusník, A., & Stepankova, K. (2017). Teaching specialized translation error-tagged translation learner corpora. Sendebar, 28, 209–241.
Florén, C. (2006). ENTRAD, an English Spanish parallel corpus created for the teaching of translation. Paper presented at the 7th Teaching and Language Corpora Conference (TALC 2006).
Gaspari, F., & Bernardini, S. (2010). Comparing non-native and translated language: Monolingual comparable corpora with a twist. In R. Xiao (Ed.), Using Corpora in Contrastive and Translation Studies (pp. 215–234). Newcastle: Cambridge Scholars Publishing.
Gillard, P., & Gadsby, A. (1998). Using a learners’ corpus in compiling ELT dictionaries. In S. Granger (Ed.), Learner English on Computer (pp. 159–171). London & New York: Addison Wesley Longman.
Graedler, A.-L. (2013). NEST—A corpus in the brooding box. Studies in Variation, Contacts and Change in English, 13. http://www.helsinki.fi/varieng/series/volumes/13/graedler/.
Granger, S. (1993). The international corpus of learner English. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English Language Corpora: Design, Analysis and Exploitation (pp. 57–69). Amsterdam & Atlanta: Rodopi.
Granger, S. (1994). The learner corpus: A revolution in applied linguistics. English Today, 10(3), 25–33.
Granger, S. (1996). From CA to CIA and back: An integrated contrastive approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in Contrast Text-based Cross-Linguistic Studies. Lund Studies in English (88th ed., pp. 37–51). Lund: Lund University Press.
Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO, 20(3), 465–480.
Granger, S., & Lefer, M.-A. (2016). From general to learners’ bilingual dictionaries: Towards a more effective fulfilment of advanced learners’ phraseological needs. International Journal of Lexicography, 29(3), 279–295.
Halverson, S. (2017). Gravitational pull in translation testing a revised model. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical Translation Studies: New Methodological and Theoretical Traditions. Trends in Linguistics. Studies and Monographs (pp. 9–45). Berlin: De Gruyter Mouton.
Hasselgård, H., & Johansson, S. (2011). Learner corpora and contrastive interlanguage analysis. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A Taste for Corpora. In honour of Sylviane Granger (pp. 33–62). Amsterdam: John Benjamins.
Johansson, S. (2007). Seeing through Multilingual Corpora On the use of corpora in contrastive studies. Amsterdam and Philadelphia: John Benjamins.
Kruger, H. (2018). Expanding the third code: Corpus-based studies of constrained communication and language mediation. In S. Granger, M.-A. Lefer, & L. Penha-Marion (Eds.), Book of Abstracts. Using Corpora in Contrastive and Translation Studies Conference (5th edition). CECL Papers 1 (pp. 9–12). Louvain-la-Neuve: Centre for English Corpus Linguistics/Université catholique de Louvain.
Kübler, N. (2008). A comparable Learner Translator Corpus: Creation and use. LREC 2008 Workshop on Comparable Corpora, (pp 73–78).
Kutuzov, A., & Kunilovskaya, M. (2014). Russian learner translator corpus: design, research potential and applications. In P. Sojka, A. Horák, I. Kopeček, & K. Palak (Eds.), Text, Speech and Dialogue. Lecture Notes in Computer Science (pp. 315–323). Berlin: Springer.
Lanstyák, I., & Heltai, P. (2012). Universals in language contact and translation. Across Languages and Cultures, 13(1), 99–121.
Lapshinova-Koltunski, E. (2013). VARTRA: A comparable corpus for analysis of translation variation. Proceedings of the 6th Workshop on Building and Using Comparable Corpora (pp. 77–86). Sofia, Bulgaria, 8 August 2013.
Laviosa, S. (1998). The English comparable corpus: A resource and a methodology. In L. Bowker, M. Cronin, D. Kenny, & J. Pearson (Eds.), Unity in Diversity? Current Trends in Translation Studies. Manchester: St. Jerome Publishing.
Lee, D. Y. W. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology, 5(3), 37–72.
Lefer, M.-A. (forthcoming). Parallel corpora. In M. Paquot, & S. Th. Gries (Eds), Practical Handbook of Corpus Linguistics. Berlin: Springer.
Lüdeling, A., & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 135–157). Cambridge: Cambridge University Press.
Macken, L., De Clercq, O., & Paulussen, H. (2011). Dutch Parallel Corpus: A Balanced Copyright-cleared Parallel Corpus. Meta, 56(2), 374–390.
Maingay, S., & Rundell, M. (1987). Anticipating learners’ errors—implications for dictionary writers. In A. P. Cowie (Ed.), The Dictionary and the Language Learner (pp. 128–135). Tübingen: Niemeyer.
Obrusník, A. (2013). A hybrid approach to parallel text alignment. Bachelor thesis. Masaryk University.
Obrusník, A. (2014). Hypal: A User-Friendly Tool for Automatic Parallel Text Alignment and Error Tagging. Eleventh International Conference Teaching and Language Corpora (pp. 67–69), Lancaster, 20–23 July 2014.
Štěpánková, K. (2014). Learner Translation Corpus: CELTraC (Czech-English Learner Translation Corpus). Bachelor’s Diploma Thesis. Masaryk University.
Uzar, R. S. (2002). A corpus methodology for analysing translation. In S.E.O. Tagnin (Ed.), Cadernos de Tradução: Corpora e Tradução (pp. 235–263). Florianópolis: NUT, 1(9).
Uzar, R., & Waliński, J. (2001). Analysing the fluency of translators. International Journal of Corpus Linguistics, 6, 155–166.
Wible, D., Kuo, C.-H., Chien, F.-Y., Liu, A., & Tsao, N.-L. (2001). A Web-based EFL writing environment: Integrating information for learners, teachers, and researchers. Computers & Education, 37, 297–315.
Wurm, A. (2016). Presentation of the KOPTE Corpus and Research Project. https://www.academia.edu/24012369/Presentation_of_the_KOPTE_Corpus_and_Research_Project.
Acknowledgements
We would like to thank the MUST local coordinators—Silvia Bernardini, Łucja Biel, Mario Cal Varela, Cem Can, Sara Castagnoli, Madalina Chitez, Elisa Corino, Julie Deconinck, Gert De Sutter, Margherita Dore, Gaetano Falco, Jonė Grigaliūnienė, Sandra Louise Halverson, Ruska Ivanovska-Naskova, Marlen Izquierdo, Xu Jiajin, Gurgen Karapetyan, Natalie Kübler, Efi Lamprou, Magnus Levin, Adriana Mezeg, Christine Michaux, Marina Morbiducci, Adriane Orenha Ottaiano, Adriana Orlandi, Heloísa Orsi Koch Delgado, Jun Pan, Anastasia Parianou, Gill Philip, Éric Poirier, Juan Pedro Rica Peromingo, Carola Strobl, Jenny Ström Herold, Olympia Tsaknaki, Jurgita Vaičenonienė, Susana Valdez, Heidi Verplaetse, Andrea Wurm—for contributing their translation data to the MUST project as well as for their helpful and enthusiastic support.
We would also like to thank the two anonymous reviewers for their helpful suggestions and comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Granger, S., Lefer, MA. The Multilingual Student Translation corpus: a resource for translation teaching and research. Lang Resources & Evaluation 54, 1183–1199 (2020). https://doi.org/10.1007/s10579-020-09485-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-020-09485-6