Skip to main content

Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences

  • Conference paper
  • First Online:
New Frontiers in Artificial Intelligence (JSAI-isAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10091))

Included in the following conference series:

  • 1222 Accesses

Abstract

Statutory sentences are generally difficult to read because of their complicated expressions and length. Such difficulty is one reason for the low quality of statistical machine translation (SMT). Multi-word expressions (MWEs) also complicate statutory sentences and extend their length. Therefore, we proposed a method that utilizes MWEs to improve the SMT system of statutory sentences. In our method, we extracted the monolingual MWEs from a parallel corpus, automatically acquired these translations based on the Dice coefficient, and integrated the extracted bilingual MWEs into an SMT system by the single-tokenization strategy. The experiment results with our SMT system using the proposed method significantly improved the translation quality. Although automatic translation equivalent acquisition using the Dice coefficient is not perfect, the best system’s score was close to a system that used bilingual MWEs whose equivalents are translated by hand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.japaneselawtranslation.go.jp/.

  2. 2.

    https://github.com/yohasebe/lemmatizer/.

References

  1. Caseli, H.M., Villavicencio, A., Machado, A., Finatto, M.J.: Statistically-driven alignment-based multiword expression identification for technical domains. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 1–8 (2009)

    Google Scholar 

  2. Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)

    Google Scholar 

  3. EDP, ALC Press Inc.: Eijiro, 8 edn. (2014)

    Google Scholar 

  4. Finlayson, M.A., Kulkarni, N.: Detecting multi-word expressions improves word sense disambiguation. In: Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World, pp. 20–24 (2011)

    Google Scholar 

  5. Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR, pp. 244–251 (2010)

    Google Scholar 

  6. Bui, T.H., Nguyen, L.M., Shimazu, A.: Translating legal sentence by segmentation and rule selection. Int. J. Nat. Lang. Comput. 2(4), 35–54 (2013)

    Article  Google Scholar 

  7. Toyama, K., Saito, D., Sekine, Y., Ogawa, Y., Kakuta, T., Kimura, T., Matsuura, Y.: Design and development of Japanese law translation memory database system. In: Law via the Internet 2011, 12 p. (2011)

    Google Scholar 

  8. Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19 (2006)

    Google Scholar 

  9. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)

    Google Scholar 

  10. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: Proceedings of the 2004 Conference on Empirical Methods on Natural Language Processing, pp. 230–237 (2004)

    Google Scholar 

  11. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  12. Pal, S., Naskar, S.K., Bandyopadhyay, S.: MWE alignment in phrase based statistical machine translation. In: Proceedings of the XIV Machine Translation Summit, pp. 61–68 (2013)

    Google Scholar 

  13. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  14. Piao, S.S., Rayson, P., Archer, D., McEnery, T.: Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput. Speech Lang. 19(4), 378–397 (2005)

    Article  Google Scholar 

  15. Ramisch, C.: Multiword Expressions Acquisition: A Generic and Open Framework. Springer, Cham (2014)

    Google Scholar 

  16. Ren, Z., Lü, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 47–54 (2009)

    Google Scholar 

  17. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi:10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  18. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)

    Google Scholar 

  19. Tsvetkov, Y., Wintner, S.: Extraction of multi-word expressions from small parallel corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1256–1264 (2010)

    Google Scholar 

  20. Zarrieß, S., Kuhn, J.: Exploiting translational correspondences for pattern-independent MWE identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 23–30 (2009)

    Google Scholar 

Download references

Acknowledgements

This research was partly supported by the Japan Society for the Promotion of Science KAKENHI Grant-in-Aid for Scientific Research (S) No. 23220005, (A) No. 26240050 and (C) No. 15K00201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasuhiro Ogawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sakamoto, S., Ogawa, Y., Nakamura, M., Ohno, T., Toyama, K. (2017). Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences. In: Otake, M., Kurahashi, S., Ota, Y., Satoh, K., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2015. Lecture Notes in Computer Science(), vol 10091. Springer, Cham. https://doi.org/10.1007/978-3-319-50953-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50953-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50952-5

  • Online ISBN: 978-3-319-50953-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics