Skip to main content

Small but Efficient: The Misconception of High- Frequency Words in Scandinavian Translation

  • Conference paper
  • First Online:
Envisioning Machine Translation in the Information Future (AMTA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1934))

Included in the following conference series:

Abstract

Machine translation has proved itself to be easier between languages that are closely related, such as German and English, while far apart languages, such as Chinese and English, encounter much more problems. The present study focuses upon Swedish and Norwegian; two languages so closely related that they would be referred to as dialects if it were not for the fact that they had a Royal house and an army connected to each of them. Despite their similarity though, some differences make the translation phase much less straight-forward than what could be expected. Taking the outset in sentence aligned parallel texts, this study aims at highlighting some of the differences, and to formalise the results. In order to do so, the texts have been aligned on smaller units, by a simple cognate alignment method. Not at all surprising, the longer words were easier to align, while shorter and often high-frequent words became a problem. Also when trying to align to a specific word sense in a dictionary, content words rendered better results. Therefore, we abandoned the use of single-word units, and searched for multi-word units whenever possible. This study reinforces the view that Machine Translation should rest upon methods based on multiword unit searches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altenberg, B.: “Speech as linear composition”. In Caie, G., Haastrup; K., Lykke Jakobsen, A., Nielsen, J-. E., Sevaldsen, J., Specht, H., Zettersten A.: Proceeding from the fourth Nordic Conference for English Studies. University of Copenhagen (1989)

    Google Scholar 

  2. Danielsson, P.: The base units of language-for automatic language treatment. Doctoral thesis (forthcoming 2000)

    Google Scholar 

  3. Danielsson, P., Ridings, D.:Pedant-Parallel texts in Gothenburg. Research report from Department of Swedish, Gothenburg University (1996)

    Google Scholar 

  4. Ekman, K.: HÄndelser vid vatten. Bonnier, Stockholm (1994)

    Google Scholar 

  5. Ekman, K.: Hendelser ved vann. Translated from Swedish by Gunnel Malmström. Aschehoug, Oslo (1997)

    Google Scholar 

  6. Haugen, E.: Semicommunication: The Language Gap in Scandinavia. In Sociological Inquiry 36 (1996) 280–297

    Article  Google Scholar 

  7. Johansson, S., Hofland, K.: Towards an English-Norwegian Parallel Corpus. In Fries U., Tottie G., Schneider P. (eds) Creating and Using English Language Corpora. Rodopi: Amsterdam (1994)

    Google Scholar 

  8. Mankell, H.: Mannen som log. Ordfront Förlag, Stockholm (1999).

    Google Scholar 

  9. Mankell, H.: Silkeridderen. Translated from Swedish by Kari Bolstad. Gyldendal Norsk Forlag ASA, Oslo (1999)

    Google Scholar 

  10. Melamed, D.: Empirical Methods for Exploiting Parallel Texts. Doctoral Thesis. Manuscript.University of Pennsylvania (1998)

    Google Scholar 

  11. Renouf, A.: What do you think of that: A pilot study of the phraseology of the core words in English. In Leitner, G. (ed.) New Directions in English Language Corpora: Methodology, results, Software developments. Mouton de Gruyter, Berlin (1992)

    Google Scholar 

  12. VikØr, L.S.: The Nordic Languages. Their Status and Interrelations. Novus Press, Oslo (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Danielsson, P., Mühlenbock, K. (2000). Small but Efficient: The Misconception of High- Frequency Words in Scandinavian Translation. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-39965-8_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41117-8

  • Online ISBN: 978-3-540-39965-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics