Skip to main content

Investigating Genre and Method Variation in Translation Using Text Classification

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

  • 1833 Accesses

Abstract

In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. We trained classifiers to 1) discriminate between different genres (fiction, political essays, etc.); and 2) identify the translation method (machine vs. human). Using semi-delexicalized models (excluding all nouns), we report results of up to 60.5% F-measure in distinguishing human and machine translations and 45.4% in discriminating between seven different genres. More than the classification performance itself, we argue that text classification methods can level out discriminative features of different variables (genres and translation methods) thus enabling researchers to investigate in more detail the properties of each of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Medlock, B.: Investigating classification for natural language processing tasks. Technical report, University of Cambridge - Computer Laboratory (2008)

    Google Scholar 

  2. Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (2014)

    Google Scholar 

  3. Diwersy, S., Evert, S., Neumann, S.: A semi-supervised multivariate approach to the study of language variation. Linguistic Variation in Text and Speech, within and across Languages (2014)

    Google Scholar 

  4. Zampieri, M., Gebre, B.G., Diwersy, S.: N-gram language models and POS distribution for the identification of Spanish varieties. In: Proceedings of TALN2013, Sable d’Olonne, France, pp. 580–587 (2013)

    Google Scholar 

  5. Lapshinova-Koltunski, E.: VARTRA: a comparable corpus for analysis of translation variation. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, pp. 77–86. ACL (2013)

    Google Scholar 

  6. Halliday, M., Hasan, R.: Language, context and text: Aspects of language in a social-semiotic perspective. Oxford University Press, Oxford (1989)

    Google Scholar 

  7. Biber, D.: Dimensions of Register Variation. A Cross Linguistic Comparison. Cambridge University Press, Cambridge (1995)

    Book  Google Scholar 

  8. Hansen-Schirra, S., Neumann, S., Steiner, E.: Cross-linguistic Corpora for the Study of Translations. Insights from the Language Pair English-German. de Gruyter, Berlin, New York (2012)

    Book  Google Scholar 

  9. Neumann, S.: Contrastive Register Variation. A Quantitative Approach to the Comparison of English and German. De Gruyter Mouton, Berlin, Boston (2013)

    Google Scholar 

  10. House, J.: Translation Quality Assessment. A Model Revisited. Günther Narr, Tübingen (1997)

    Google Scholar 

  11. Steiner, E.: An extended register analysis as a form of text analysis for translation. In: Wotjak, G., Schmidt, H. (eds.) Modelle der Translation - Models of Translation, pp. 235–256. Leipziger Schriften zur Kultur-, Literatur-, Sprach- und Übersetzungswissenschaft, Leipzig (1996)

    Google Scholar 

  12. Steiner, E.: A register-based translation evaluation. TARGET, International Journal of Translation Studies 10(2), 291–318 (1997)

    Google Scholar 

  13. Steiner, E.: Translated Texts. Properties, Variants, Evaluations. Peter Lang Verlag, Frankfurt/M (2004)

    Google Scholar 

  14. De Sutter, G., Delaere, I., Plevoets, K.: Lexical lectometry in corpus-based translation studies: combining profile-based correspondence analysis and logistic regression modeling. In: Quantitative Methods in Corpus-based Translation Studies: a Practical Guide to Descriptive Translation Research, vol. 51. John Benjamins Publishing Company, Amsterdam, pp. 325–345 (2012)

    Google Scholar 

  15. Delaere, I., De Sutter, G.: Applying a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated Dutch. Belgian Journal of Linguistics 27, 43–60 (2013)

    Article  Google Scholar 

  16. Irvine, A., Morgan, J., Carpuat, M., Daumé III, H., Munteanu, D.S.: Measuring machine translation errors in new domains. TACL 1, 429–440 (2013)

    Google Scholar 

  17. Santini, M., Mehler, A., Sharoff, S.: Riding the rough waves of genre on the web. In: Mehler, A., Sharoff, S., Santini, M. (eds.) Genres on the Web: Computational Models and Empirical Studies. Springer, pp. 3–30 (2010)

    Google Scholar 

  18. Wu, H., Wang, H., Zong, C.: Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of COLING-2008, Manchester, UK, pp. 993–1000 (2008)

    Google Scholar 

  19. Irvine, A., Callison-Burch, C.: Using comparable corpora to adapt MT models to new domains. In: Proceedings of the ACL Workshop on Statistical Machine Translation (WMT) (2014)

    Google Scholar 

  20. Popovic, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)

    Article  MathSciNet  Google Scholar 

  21. Fishel, M., Sennrich, R., Popovic, M., Bojar, O.: Terrorcat: a translation error categorization-based mt quality metric. In: 7th Workshop on Statistical Machine Translation (2012)

    Google Scholar 

  22. Volansky, V., Ordan, N., Wintner, S.: More human or more translated? Original texts vs. human and machine translations. In: Proceedings of the 11th Bar-Ilan Symposium on the Foundations of AI With ISCOL (2011)

    Google Scholar 

  23. Gellerstam, M.: Translationese in Swedish novels translated from English. In: Translation Studies in Scandinavia, pp. 88–95 (1986)

    Google Scholar 

  24. Baker, M., et al.: Corpus linguistics and translation studies: Implications and applications. Text and technology: In honour of John Sinclair 233, 250 (1993)

    Google Scholar 

  25. Baroni, M., Bernardini, S.: A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3), 259–274 (2006)

    Article  Google Scholar 

  26. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese: a machine learning approach. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 503–511. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Volansky, V., Ordan, N., Wintner, S.: On the features of translationese. Literary and Linguistic Computing (2013)

    Google Scholar 

  28. Ciobanu, A.M., Dinu, L.P.: A quantitative insight into the impact of translation on readability. In: Proceedings of the 3rd PITR workshop, pp. 104–113 (2014)

    Google Scholar 

  29. Gebre, B.G., Zampieri, M., Wittenburg, P., Heskens, T.: Improving native language identification with tf-idf weighting. In: Proceedings of the BEA, Atlanta, USA (2013)

    Google Scholar 

  30. Zampieri, M., Gebre, B.G.: Varclass: An open source language identification tool for language varieties. In: Language Resources and Evaluation (LREC) (2014)

    Google Scholar 

  31. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  32. Petrenz, P., Webber, B.: Robust cross-lingual genre classification through comparable corpora. In: The 5th Workshop on Building and Using Comparable Corpora (2012)

    Google Scholar 

  33. Quiniou, S., Cellier, P., Charnois, T., Legallois, D.: What about sequential data mining techniques to identify linguistic patterns for stylistics? In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 166–177. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Zampieri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zampieri, M., Lapshinova-Koltunski, E. (2015). Investigating Genre and Method Variation in Translation Using Text Classification. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics