Investigating Genre and Method Variation in Translation Using Text Classification

Zampieri, Marcos; Lapshinova-Koltunski, Ekaterina

doi:10.1007/978-3-319-24033-6_5

Marcos Zampieri^15,16 &
Ekaterina Lapshinova-Koltunski¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1833 Accesses

Abstract

In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. We trained classifiers to 1) discriminate between different genres (fiction, political essays, etc.); and 2) identify the translation method (machine vs. human). Using semi-delexicalized models (excluding all nouns), we report results of up to 60.5% F-measure in distinguishing human and machine translations and 45.4% in discriminating between seven different genres. More than the classification performance itself, we argue that text classification methods can level out discriminative features of different variables (genres and translation methods) thus enabling researchers to investigate in more detail the properties of each of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Medlock, B.: Investigating classification for natural language processing tasks. Technical report, University of Cambridge - Computer Laboratory (2008)
Google Scholar
Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (2014)
Google Scholar
Diwersy, S., Evert, S., Neumann, S.: A semi-supervised multivariate approach to the study of language variation. Linguistic Variation in Text and Speech, within and across Languages (2014)
Google Scholar
Zampieri, M., Gebre, B.G., Diwersy, S.: N-gram language models and POS distribution for the identification of Spanish varieties. In: Proceedings of TALN2013, Sable d’Olonne, France, pp. 580–587 (2013)
Google Scholar
Lapshinova-Koltunski, E.: VARTRA: a comparable corpus for analysis of translation variation. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, pp. 77–86. ACL (2013)
Google Scholar
Halliday, M., Hasan, R.: Language, context and text: Aspects of language in a social-semiotic perspective. Oxford University Press, Oxford (1989)
Google Scholar
Biber, D.: Dimensions of Register Variation. A Cross Linguistic Comparison. Cambridge University Press, Cambridge (1995)
Book Google Scholar
Hansen-Schirra, S., Neumann, S., Steiner, E.: Cross-linguistic Corpora for the Study of Translations. Insights from the Language Pair English-German. de Gruyter, Berlin, New York (2012)
Book Google Scholar
Neumann, S.: Contrastive Register Variation. A Quantitative Approach to the Comparison of English and German. De Gruyter Mouton, Berlin, Boston (2013)
Google Scholar
House, J.: Translation Quality Assessment. A Model Revisited. Günther Narr, Tübingen (1997)
Google Scholar
Steiner, E.: An extended register analysis as a form of text analysis for translation. In: Wotjak, G., Schmidt, H. (eds.) Modelle der Translation - Models of Translation, pp. 235–256. Leipziger Schriften zur Kultur-, Literatur-, Sprach- und Übersetzungswissenschaft, Leipzig (1996)
Google Scholar
Steiner, E.: A register-based translation evaluation. TARGET, International Journal of Translation Studies 10(2), 291–318 (1997)
Google Scholar
Steiner, E.: Translated Texts. Properties, Variants, Evaluations. Peter Lang Verlag, Frankfurt/M (2004)
Google Scholar
De Sutter, G., Delaere, I., Plevoets, K.: Lexical lectometry in corpus-based translation studies: combining profile-based correspondence analysis and logistic regression modeling. In: Quantitative Methods in Corpus-based Translation Studies: a Practical Guide to Descriptive Translation Research, vol. 51. John Benjamins Publishing Company, Amsterdam, pp. 325–345 (2012)
Google Scholar
Delaere, I., De Sutter, G.: Applying a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated Dutch. Belgian Journal of Linguistics 27, 43–60 (2013)
Article Google Scholar
Irvine, A., Morgan, J., Carpuat, M., Daumé III, H., Munteanu, D.S.: Measuring machine translation errors in new domains. TACL 1, 429–440 (2013)
Google Scholar
Santini, M., Mehler, A., Sharoff, S.: Riding the rough waves of genre on the web. In: Mehler, A., Sharoff, S., Santini, M. (eds.) Genres on the Web: Computational Models and Empirical Studies. Springer, pp. 3–30 (2010)
Google Scholar
Wu, H., Wang, H., Zong, C.: Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of COLING-2008, Manchester, UK, pp. 993–1000 (2008)
Google Scholar
Irvine, A., Callison-Burch, C.: Using comparable corpora to adapt MT models to new domains. In: Proceedings of the ACL Workshop on Statistical Machine Translation (WMT) (2014)
Google Scholar
Popovic, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)
Article MathSciNet Google Scholar
Fishel, M., Sennrich, R., Popovic, M., Bojar, O.: Terrorcat: a translation error categorization-based mt quality metric. In: 7th Workshop on Statistical Machine Translation (2012)
Google Scholar
Volansky, V., Ordan, N., Wintner, S.: More human or more translated? Original texts vs. human and machine translations. In: Proceedings of the 11th Bar-Ilan Symposium on the Foundations of AI With ISCOL (2011)
Google Scholar
Gellerstam, M.: Translationese in Swedish novels translated from English. In: Translation Studies in Scandinavia, pp. 88–95 (1986)
Google Scholar
Baker, M., et al.: Corpus linguistics and translation studies: Implications and applications. Text and technology: In honour of John Sinclair 233, 250 (1993)
Google Scholar
Baroni, M., Bernardini, S.: A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3), 259–274 (2006)
Article Google Scholar
Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese: a machine learning approach. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 503–511. Springer, Heidelberg (2010)
Chapter Google Scholar
Volansky, V., Ordan, N., Wintner, S.: On the features of translationese. Literary and Linguistic Computing (2013)
Google Scholar
Ciobanu, A.M., Dinu, L.P.: A quantitative insight into the impact of translation on readability. In: Proceedings of the 3rd PITR workshop, pp. 104–113 (2014)
Google Scholar
Gebre, B.G., Zampieri, M., Wittenburg, P., Heskens, T.: Improving native language identification with tf-idf weighting. In: Proceedings of the BEA, Atlanta, USA (2013)
Google Scholar
Zampieri, M., Gebre, B.G.: Varclass: An open source language identification tool for language varieties. In: Language Resources and Evaluation (LREC) (2014)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Petrenz, P., Webber, B.: Robust cross-lingual genre classification through comparable corpora. In: The 5th Workshop on Building and Using Comparable Corpora (2012)
Google Scholar
Quiniou, S., Cellier, P., Charnois, T., Legallois, D.: What about sequential data mining techniques to identify linguistic patterns for stylistics? In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 166–177. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Saarland University, Saarbrücken, Germany
Marcos Zampieri & Ekaterina Lapshinova-Koltunski
German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany
Marcos Zampieri

Authors

Marcos Zampieri
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Lapshinova-Koltunski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos Zampieri .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zampieri, M., Lapshinova-Koltunski, E. (2015). Investigating Genre and Method Variation in Translation Using Text Classification. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_5
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics