Skip to main content

Controlling Formality and Style of Machine Translation Output Using AutoML

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2019)

Abstract

An often overlooked difficulty of machine translation is producing a consistent formality (or register) in the target language. This is especially hard when the source language may have fewer levels of formality than the target language. We take a transfer learning approach using Google’s AutoML Translate to train custom neural machine translation (NMT) models to consistently produce a specific formality. We experiment with formality levels for English to Spanish, English to French and English to Czech. This approach makes it possible to have better and more consistent in-context translation while still leveraging the strength of a general purpose machine translation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See https://cloud.google.com/translate/automl/docs/ for official Google documentation.

  2. 2.

    For example, in Spanish, if a target segment contains any of the words listed in Column 1 of Table 1, it is a sufficient condition to determine that its register is informal (T). However, determining that a target segment is of the formal register (V) is more challenging because some of the words that signify the formal register are also used to refer to the 3rd person (e.g. Spanish suyo can mean English formal yours or 3rd person his). To solve for this, we filter segment pairs where the target segment contains (V) markers and the source segment contains any English inclusion words like 2nd person pronouns (e.g. “you”, “yours”) and does not contain any English exclusion words like 3rd person pronouns (e.g. “her”, “she”, “them”). This combined rule is a sufficient condition to determine that the register of the target segment is formal (V).

  3. 3.

    Segments may consist of either a single sentence or multiple sentences.

References

  1. Biber, D., Finegan, E.: Sociolinguistic Perspectives on Register. Oxford University Press on Demand, Oxford (1994)

    Google Scholar 

  2. Brown, R., Gilman, A., et al.: The pronouns of power and solidarity. Bobbs-Merrill, Indianapolis (1960)

    Google Scholar 

  3. Chen, M.X., et al.: The best of both worlds: combining recent advances in neural machine translation. arXiv preprint arXiv:1804.09849 (2018)

  4. Michel, P., Neubig, G.: Extreme adaptation for personalized neural machine translation. arXiv preprint arXiv:1805.01817 (2018)

  5. Niu, X., Martindale, M., Carpuat, M.: A study of style in machine translation: controlling the formality of machine translation output. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2814–2819 (2017)

    Google Scholar 

  6. Niu, X., Rao, S., Carpuat, M.: Multi-task neural models for translating between styles within and across languages. arXiv preprint arXiv:1806.04357 (2018)

  7. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, Association for Computational Linguistics, Stroudsburg, PA, USA (2002), pp. 311–318. https://doi.org/10.3115/1073083.1073135

  8. Posner, R.: The Romance Languages. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  9. Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: corpus, benchmarks and metrics for formality style transfer (2018)

    Google Scholar 

  10. Sennrich, R., Haddow, B., Birch, A.: Controlling politeness in neural machine translation via side constraints. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 35–40 (2016)

    Google Scholar 

  11. Sohn, H.M.: The Korean Language. Cambridge University Press, Cambridge (2001)

    Google Scholar 

  12. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144

  13. Xiao, Z., McEnery, A.: Two approaches to genre analysis: three genres in modern american english. J. Eng. Ling. 33(1), 62–82 (2005)

    Article  Google Scholar 

  14. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aditi Viswanathan , Varden Wang or Antonina Kononova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Viswanathan, A., Wang, V., Kononova, A. (2020). Controlling Formality and Style of Machine Translation Output Using AutoML. In: Lossio-Ventura, J.A., Condori-Fernandez, N., Valverde-Rebaza, J.C. (eds) Information Management and Big Data. SIMBig 2019. Communications in Computer and Information Science, vol 1070. Springer, Cham. https://doi.org/10.1007/978-3-030-46140-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46140-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46139-3

  • Online ISBN: 978-3-030-46140-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics