Old Needs, New Solutions: Comparable Corpora for Language Professionals

Bernardini, Silvia; Ferraresi, Adriano

doi:10.1007/978-3-642-20128-8_16

Silvia Bernardini⁵ &
Adriano Ferraresi⁵

1167 Accesses
6 Citations

Abstract

Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitations that could be compensated for through the integration of comparable corpora and corpus building tools in the translator’s toolkit. This chapter provides an overview of the ways in which different types of comparable corpora can be used in translation teaching and practice. First, two traditional corpus typologies are presented, namely small and specialized “handmade” corpora collected by end-users themselves for a specific task, and large and general “manufactured” corpora collected by expert teams and made available to end users. We suggest that striking a middleground between these two opposites is vital for professional uptake. To this end, we show how the BootCaT toolkit can be used to construct largish and relatively specialized comparable corpora for a specific translation task, and how, varying the search parameters in very simple ways, the size and usability of the corpora thus constructed can be further increased. The process is exemplified with reference to a simulated task (the translation of a patient information leaflet from English into Italian) and its efficacy is evaluated through an end-user questionnaire.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This may change in the future, as more tools like Linguee (http://www.linguee.com/) provide access to the aligned web, and possibly to subsections of it.
2.
As suggested in the previous section, in this chapter we are not specifically discussing aligned parallel corpora. In our view these are more akin to TMs than to comparable corpora in terms of the technical issues involved in their construction and consultation, and of the type of insights translators can obtain from them; they are therefore not directly relevant here.
3.
“More advanced” corpus querying techniques, like extraction of keywords or computation of collocational scores can of course be of great interest to translators. However, their relevance and usefulness may be hard to grasp for less corpus-savvy users, and hence they are not discussed here.
4.
See http://wacky.sslmit.unibo.it/doku.php for information about ukWaC and itWaC.
5.
Currently BootCaT uses Bing for URL retrieval, after both Google and Yahoo! discontinued their API services.
6.
http://www.antlab.sci.waseda.ac.jp/software.html
7.
In this paper we define genre (loosely based on Swales [34]) as a recognizable set of communicative events with a shared purpose and common formal features.
8.
We used the frontend developed by Eros Zanchetta [37] and available here: http://bootcat.sslmit.unibo.it/.

References

Aston, G.: Corpus use and learning to translate. Textus 12, 289–314 (1999)
Google Scholar
Baroni, M., Bernardini, S.: Bootcat: Bootstrapping corpora and terms from the web. In: Proceedings of LREC 2004, pp. 1313–1316, Lisbon. ELDA (2004)
Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)
Article Google Scholar
Bernardini, S., Castagnoli, S., Ferraresi, A., Gaspari, F., Zanchetta, E.: Introducing comparapedia: a new resource for corpus-based translation studies. Paper Presented at the International Symposium on Using Corpora in Contrastive and Translation Studies (UCCTS 2010), Edge Hill University, Ormskirk (July 2010)
Google Scholar
Biber, D., Conrad, S.: Lexical bundles in conversation and academic prose. In: Hasselgard, H., Oksefjell, S. (eds.) Out of Corpora: Studies in Honour of Stig Johansson, pp. 181–190. Rodopi, Amsterdam (1999)
Google Scholar
Bowker, L.: Computer-Aided Translation Technology: A Practical Introduction. University of Ottawa Press, Ottawa (2002)
Google Scholar
Bowker, L.: Examining the impact of corpora on terminographic practice in the context of translation. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 211–236. Continuum, London (2011)
Google Scholar
Castagnoli, S.: Using the web as a source of LSP corpora in the terminology classroom. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, pp. 159–172. GEDIT, Bologna (2006)
Google Scholar
Chama, Z.: From segment focus to context orientation. TC World, 2010. online: http://www.tcworld.info/index.php?id=167
Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX 1994, pp. 23–32, Budapest (1994)
Google Scholar
Cowie, A.P. (ed.): Phraseology: Theory, Analysis, and Applications. Oxford University Press, Oxford (2001)
Google Scholar
Crossley, S.A., Louwerse, M.M.: Multi-dimensional register classication using bi-grams. Int. J. Corpus Linguist. 12(4), 453–478 (2007)
Google Scholar
Crowston, K., Kwasnik, B.H.: A framework for creating a facetted classification for genres: addressing issues of multidimensionality. Hawaii International Conference on System Sciences, 4, 2004. online: http://doi.ieeecomputersociety.org/10.1109/HICSS.2004.1265268
Dsilets, A., Melanon, C., Patenaude, G., Brunette, L.: How translators use tools and resources to resolve translation problems: an ethnographic study. In: Proceedings of MT Summit XII-Workshop: Beyond Translation Memories, Ottawa (2009)
Google Scholar
Fantinuoli, C.: Specialized corpora from the web and term extraction for simultaneous interpreters. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, GEDIT, Bologna pp. 173–190 (2006)
Google Scholar
Ferraresi, A., Bernardini, S., Picci, G., Baroni, M.: Web corpora for bilingual lexicography: a pilot study of English-French collocation extraction and translation. In: Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies, pp. 337–359. Cambridge Scholars Publishing, Newcastle (2010)
Google Scholar
Gatto, M.: From Body to Web. An Introduction to the Web as Corpus. Laterza, Bari (2009)
Google Scholar
Gavioli, L.: Exploring Corpora for ESP Learning. Benjamins, Amsterdam (2005)
Google Scholar
Ghadessy, M., Henry, A., Roseberry, R.L. (eds.) Small Corpus Studies and ELT. Benjamins, Amsterdam (2001)
Google Scholar
Goeuriot, L., Morin, M., Daille, B.: Compilation of specialized comparable corpus in French and Japanese. In: Proceedings of the ACL-IJCNLP workshop Building and Using Comparable Corpora (BUCC 2009), 2009
Google Scholar
Gries, S.Th., Mukherjee, J.: Lexical gravity across varieties of English: an ICE-based study of n-grams in Asian Englishes. Int. J. Corpus Linguist. 15(4), 520–548 (2010)
Google Scholar
Heid, U.: Corpus linguistics and lexicography. In: Kytö, M., Lüdeling, A. (eds.) Corpus Linguistics: An International Handbook, pp. 131–153. Mouton de Gruyter, Berlin (2008)
Google Scholar
Hoey, M.: Lexical priming and translation. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 153–168. Continuum, London (2011)
Google Scholar
MeLLANGE: Corpora and e-learning questionnaire. Results summary - professionals. Internal Document (2006)
Google Scholar
MultiTrans. Multitrans 4(tm): Taking the multilingual textbase approach to new heights. MultiCorpora White Paper, online: http://www.multicorpora.com/lesNVIAdmin/File/MCwhitepaper1.pdf (August 2005)
Munday, J.: Looming large: a cross-linguistic analysis of semantic prosodies in comparable reference corpora. In: Kruger, A., Wallmach, K., Munday, J. (eds.) Corpus-Based Translation Studies, pp. 169–186. Continuum, London (2011)
Google Scholar
Pearson, J.: Terms in Context. Benjamins, Amsterdam (1998)
Google Scholar
Pearson, J.: Using parallel texts in the translator training environment. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translator Education, pp. 15–24. St Jerome, Manchester (2003)
Google Scholar
Philip, G.: Arriving at equivalence: Making a case for comparable general reference corpora in translation studies. In: Beeby, A., Rodríguez Inés, P., Sánchez-Gijón, P. (eds.) Corpus Use and Translating, pp. 59–73. Benjamins, Amsterdam (2009)
Google Scholar
Rinsche, A., Zanotti, N.P.: Study on the Size of the Language Industry in the EU. European Commission - Directorate General for Translation, Brussels (2009)
Google Scholar
Santini, M.: State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, ITRI, University of Brighton, UK (2004)
Google Scholar
Serianni, L.: Grammatica Italiana. UTET, Torino (1991)
Google Scholar
Sharoff, S.: Creating general-purpose corpora using automated search engine. In: Baroni, M., Bernardini, S. (eds.) Wacky! Working Papers on the Web as Corpus, pp. 63–98. GEDIT, Bologna
Google Scholar
Swales, J.: Genre Analysis. English in Academic and Research Settings. Cambridge University Press, Cambridge (1990)
Google Scholar
Varantola, K.: Translators and disposable corpora. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translator Education, pp. 55–70. St Jerome, Manchester (2003)
Google Scholar
Williams, I. A.: A translator’s reference needs: dictionaries or parallel texts. Target 8, 277–299 (1996)
Google Scholar
Zanchetta, E.: Corpora for the masses: the BootCaT front-end. Pecha Kucha Presented at the Corpus Linguistics 2011 Conference. University of Birmingham, Birmingham (July 2011)
Google Scholar

Download references

Acknowledgments

We would like to thank the students and colleagues who have kindly accepted to evaluate the URLs for us, Claudia Lecci for her expert insights about TM software, Federico Gaspari for fruitful lunchtime discussions on corpus construction strategies as well as the anonymous reviewer and the editors of the volume for their valuable feedback and suggestions.

Author information

Authors and Affiliations

Department of Interpreting and Translation, Corso della Repubblica 136, 47121, Forlì, FC, Italy
Silvia Bernardini & Adriano Ferraresi

Authors

Silvia Bernardini
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Ferraresi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Bernardini .

Editor information

Editors and Affiliations

Centre for Translation Studies, University of Leeds, Leeds, United Kingdom
Serge Sharoff
University of Mainz, Mainz, Germany
Reinhard Rapp
Université de Paris-Sud LIMSI-CNRS, Orsay, France
Pierre Zweigenbaum
Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, People's Republic of China
Pascale Fung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bernardini, S., Ferraresi, A. (2013). Old Needs, New Solutions: Comparable Corpora for Language Professionals. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds) Building and Using Comparable Corpora. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-20128-8_16
Published: 14 December 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20127-1
Online ISBN: 978-3-642-20128-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics