Abstract
Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitations that could be compensated for through the integration of comparable corpora and corpus building tools in the translator’s toolkit. This chapter provides an overview of the ways in which different types of comparable corpora can be used in translation teaching and practice. First, two traditional corpus typologies are presented, namely small and specialized “handmade” corpora collected by end-users themselves for a specific task, and large and general “manufactured” corpora collected by expert teams and made available to end users. We suggest that striking a middleground between these two opposites is vital for professional uptake. To this end, we show how the BootCaT toolkit can be used to construct largish and relatively specialized comparable corpora for a specific translation task, and how, varying the search parameters in very simple ways, the size and usability of the corpora thus constructed can be further increased. The process is exemplified with reference to a simulated task (the translation of a patient information leaflet from English into Italian) and its efficacy is evaluated through an end-user questionnaire.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This may change in the future, as more tools like Linguee (http://www.linguee.com/) provide access to the aligned web, and possibly to subsections of it.
- 2.
As suggested in the previous section, in this chapter we are not specifically discussing aligned parallel corpora. In our view these are more akin to TMs than to comparable corpora in terms of the technical issues involved in their construction and consultation, and of the type of insights translators can obtain from them; they are therefore not directly relevant here.
- 3.
“More advanced” corpus querying techniques, like extraction of keywords or computation of collocational scores can of course be of great interest to translators. However, their relevance and usefulness may be hard to grasp for less corpus-savvy users, and hence they are not discussed here.
- 4.
See http://wacky.sslmit.unibo.it/doku.php for information about ukWaC and itWaC.
- 5.
Currently BootCaT uses Bing for URL retrieval, after both Google and Yahoo! discontinued their API services.
- 6.
- 7.
In this paper we define genre (loosely based on Swales [34]) as a recognizable set of communicative events with a shared purpose and common formal features.
- 8.
We used the frontend developed by Eros Zanchetta [37] and available here: http://bootcat.sslmit.unibo.it/.
References
Aston, G.: Corpus use and learning to translate. Textus 12, 289–314 (1999)
Baroni, M., Bernardini, S.: Bootcat: Bootstrapping corpora and terms from the web. In: Proceedings of LREC 2004, pp. 1313–1316, Lisbon. ELDA (2004)
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)
Bernardini, S., Castagnoli, S., Ferraresi, A., Gaspari, F., Zanchetta, E.: Introducing comparapedia: a new resource for corpus-based translation studies. Paper Presented at the International Symposium on Using Corpora in Contrastive and Translation Studies (UCCTS 2010), Edge Hill University, Ormskirk (July 2010)
Biber, D., Conrad, S.: Lexical bundles in conversation and academic prose. In: Hasselgard, H., Oksefjell, S. (eds.) Out of Corpora: Studies in Honour of Stig Johansson, pp. 181–190. Rodopi, Amsterdam (1999)
Bowker, L.: Computer-Aided Translation Technology: A Practical Introduction. University of Ottawa Press, Ottawa (2002)
Bowker, L.: Examining the impact of corpora on terminographic practice in the context of translation. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 211–236. Continuum, London (2011)
Castagnoli, S.: Using the web as a source of LSP corpora in the terminology classroom. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, pp. 159–172. GEDIT, Bologna (2006)
Chama, Z.: From segment focus to context orientation. TC World, 2010. online: http://www.tcworld.info/index.php?id=167
Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX 1994, pp. 23–32, Budapest (1994)
Cowie, A.P. (ed.): Phraseology: Theory, Analysis, and Applications. Oxford University Press, Oxford (2001)
Crossley, S.A., Louwerse, M.M.: Multi-dimensional register classication using bi-grams. Int. J. Corpus Linguist. 12(4), 453–478 (2007)
Crowston, K., Kwasnik, B.H.: A framework for creating a facetted classification for genres: addressing issues of multidimensionality. Hawaii International Conference on System Sciences, 4, 2004. online: http://doi.ieeecomputersociety.org/10.1109/HICSS.2004.1265268
Dsilets, A., Melanon, C., Patenaude, G., Brunette, L.: How translators use tools and resources to resolve translation problems: an ethnographic study. In: Proceedings of MT Summit XII-Workshop: Beyond Translation Memories, Ottawa (2009)
Fantinuoli, C.: Specialized corpora from the web and term extraction for simultaneous interpreters. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, GEDIT, Bologna pp. 173–190 (2006)
Ferraresi, A., Bernardini, S., Picci, G., Baroni, M.: Web corpora for bilingual lexicography: a pilot study of English-French collocation extraction and translation. In: Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies, pp. 337–359. Cambridge Scholars Publishing, Newcastle (2010)
Gatto, M.: From Body to Web. An Introduction to the Web as Corpus. Laterza, Bari (2009)
Gavioli, L.: Exploring Corpora for ESP Learning. Benjamins, Amsterdam (2005)
Ghadessy, M., Henry, A., Roseberry, R.L. (eds.) Small Corpus Studies and ELT. Benjamins, Amsterdam (2001)
Goeuriot, L., Morin, M., Daille, B.: Compilation of specialized comparable corpus in French and Japanese. In: Proceedings of the ACL-IJCNLP workshop Building and Using Comparable Corpora (BUCC 2009), 2009
Gries, S.Th., Mukherjee, J.: Lexical gravity across varieties of English: an ICE-based study of n-grams in Asian Englishes. Int. J. Corpus Linguist. 15(4), 520–548 (2010)
Heid, U.: Corpus linguistics and lexicography. In: Kytö, M., Lüdeling, A. (eds.) Corpus Linguistics: An International Handbook, pp. 131–153. Mouton de Gruyter, Berlin (2008)
Hoey, M.: Lexical priming and translation. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 153–168. Continuum, London (2011)
MeLLANGE: Corpora and e-learning questionnaire. Results summary - professionals. Internal Document (2006)
MultiTrans. Multitrans 4(tm): Taking the multilingual textbase approach to new heights. MultiCorpora White Paper, online: http://www.multicorpora.com/lesNVIAdmin/File/MCwhitepaper1.pdf (August 2005)
Munday, J.: Looming large: a cross-linguistic analysis of semantic prosodies in comparable reference corpora. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 169–186. Continuum, London (2011)
Pearson, J.: Terms in Context. Benjamins, Amsterdam (1998)
Pearson, J.: Using parallel texts in the translator training environment. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translator Education, pp. 15–24. St Jerome, Manchester (2003)
Philip, G.: Arriving at equivalence: Making a case for comparable general reference corpora in translation studies. In: Beeby, A., Rodríguez Inés, P., Sánchez-Gijón, P. (eds.) Corpus Use and Translating, pp. 59–73. Benjamins, Amsterdam (2009)
Rinsche, A., Zanotti, N.P.: Study on the Size of the Language Industry in the EU. European Commission - Directorate General for Translation, Brussels (2009)
Santini, M.: State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, ITRI, University of Brighton, UK (2004)
Serianni, L.: Grammatica Italiana. UTET, Torino (1991)
Sharoff, S.: Creating general-purpose corpora using automated search engine. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, pp. 63–98. GEDIT, Bologna
Swales, J.: Genre Analysis. English in Academic and Research Settings. Cambridge University Press, Cambridge (1990)
Varantola, K.: Translators and disposable corpora. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translator Education, pp. 55–70. St Jerome, Manchester (2003)
Williams, I. A.: A translator’s reference needs: dictionaries or parallel texts. Target 8, 277–299 (1996)
Zanchetta, E.: Corpora for the masses: the BootCaT front-end. Pecha Kucha Presented at the Corpus Linguistics 2011 Conference. University of Birmingham, Birmingham (July 2011)
Acknowledgments
We would like to thank the students and colleagues who have kindly accepted to evaluate the URLs for us, Claudia Lecci for her expert insights about TM software, Federico Gaspari for fruitful lunchtime discussions on corpus construction strategies as well as the anonymous reviewer and the editors of the volume for their valuable feedback and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bernardini, S., Ferraresi, A. (2013). Old Needs, New Solutions: Comparable Corpora for Language Professionals. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds) Building and Using Comparable Corpora. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-20128-8_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20127-1
Online ISBN: 978-3-642-20128-8
eBook Packages: Computer ScienceComputer Science (R0)