Skip to main content

Morphology to the Rescue Redux: Resolving Borrowings and Code-Mixing in Machine Translation

  • Conference paper
Systems and Frameworks for Computational Morphology (SFCM 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 100))

Abstract

In the IBM LMT machine translation system, derivational morphological rules recognize and analyze words that are not found in its source lexicons, and generate default transfers for these unlisted words. Unfound words with no inflectional or derivational affixes are by default nouns. These rules are now expanded to provide lexical coverage of a particular set of words created on the fly in emails by bilingual Spanish-English speakers. What characterizes the approach is the generation of additional default parts of speech, and the use of morphological, semantic, and syntactic features from both source and target lexicons for analysis and transfer. A built-in rule-based strategy to handle language borrowing and code-mixing allows for the recognition of words with variable and unpredictable frequency of occurrence, which would remain otherwise unfound, thus affecting the accuracy of parsing and the quality of translation output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gardner-Chloros, P., Edwards, M.: Assumptions behind grammatical approaches to code-switching: when the blueprint is a red herring. Transactions of the Philological Society 102(1), 103–129 (2004)

    Article  Google Scholar 

  2. Solorio, T., Liu, Y.: Learning to Predict Code-switching Points. In: Proceedings of Empirical Methods on Natural Language Processing, pp. 973–981 (2008)

    Google Scholar 

  3. A Parallel Corpus for Statistical Machine Translation, http://www.statmt.org/europarl

  4. McCord, M., Wolff, S.: The Lexicon and Morphology for LMT. IBM Research Division Research Report, RC 13403 (1988)

    Google Scholar 

  5. McCord, M.C., Bernth, A.: The LMT Transformational System. Machine Translation and the Information Soup. In: Proceedings of the 3rd AMTA Conference, pp. 344–354. Springer, Heidelberg (1998)

    Google Scholar 

  6. McCord, M.C.: Slot Grammar: A system for simple construction of practical natural language grammars. In: Studer, R. (ed.) Natural Language and Logic: International Scientific Symposium, pp. 118–145. Springer, Berlin (1990)

    Chapter  Google Scholar 

  7. Gdaniec, C., Manandise, E., McCord, M.: Derivational Morphology to the Rescue: How It Can Help Resolve Unfound Words in MT. In: Hutchins, J. (ed.) Proceedings, MT Summit VIII, Santiago (2001); CD edn., compiled by Hutchins, J.

    Google Scholar 

  8. Gdaniec, C., Manandise, E.: Using Word Formation Rules to Extend MT Lexicons. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 64–73. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Cartoni, B.: Lexical Morphology in Machine Translation: a Feasibility Study. In: Proceedings of of the 12th Conference of the European Chapter of the ACL, pp. 130–138 (2009)

    Google Scholar 

  10. Meni, A., Goldberg, Y., Gabay, D., Elhadad, M.: Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis. In: Proceedings of ACL 2008, HLT, pp. 728–736 (2008)

    Google Scholar 

  11. Solorio, T., Liu, Y., Medina, B.: Part-of-speech Tagging English-Spanish Code-switched Text. In: Proceedings of Empirical Methods on Natural Language Processing (2008)

    Google Scholar 

  12. Franco, J.C., Solorio, T.: Baby-Steps towards Building a Spanglish Language Model. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 75–84. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Goyal, P., Mital, M.R., Mukerjee, A., Raina, A.M., Sharma, D., Vikram, K.: Saarthaka: A Bilingual Parser for Hindi, English and Code-Switching Structures. In: Proceedings of EACL 2003, European Chapter of the Association for Computational Linguistics, Budapest, pp. 15–24 (2003)

    Google Scholar 

  14. Joshi, A.: Processing of Sentences with Intra-sentential Code-switching. In: Horecky, J. (ed.) COLING 1982. North-Holland Publishing Company © Academia, Amsterdam (1982)

    Google Scholar 

  15. Sinha, R.M.K., Thakur, A.: Machine Translation of Bi-lingual Hindi-English (Hinglish) Text. In: Proceedings of the 10th Conference on Machine Translation, Phuket, Thailand, pp. 149–156 (2005)

    Google Scholar 

  16. Alex, B., Dubey., A., Keller, F.: Using Foreign Inclusion Detection to Improve Parsing Performance. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 151–160 (2007)

    Google Scholar 

  17. Rinsche, A.: Towards a MT Evaluation Methodology. In: Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation, Kyoto, Japan, July 14-16 (1993), http://www.mt-archive.info

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manandise, E., Gdaniec, C. (2011). Morphology to the Rescue Redux: Resolving Borrowings and Code-Mixing in Machine Translation. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2011. Communications in Computer and Information Science, vol 100. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23138-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23138-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23137-7

  • Online ISBN: 978-3-642-23138-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics