Automatic Method to Build a Dictionary for Class-Based Translation Systems

Takai, Kohichi; Hattori, Gen; Yasuda, Keiji; Heracleous, Panikos; Ishikawa, Akio; Matsumoto, Kazunori; Sugaya, Fumiaki

doi:10.1007/978-3-031-23793-5_24

Kohichi Takai⁸,
Gen Hattori⁸,
Keiji Yasuda⁸,
Panikos Heracleous⁸,
Akio Ishikawa⁸,
Kazunori Matsumoto⁸ &
…
Fumiaki Sugaya^8,9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13396))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

231 Accesses

Abstract

Mis-translation or dropping of proper nouns reduces the quality of machine translation or speech translation output. In this paper, we propose a method to build a proper noun dictionary for the systems which use class-based language models. The method consists of two parts: training data building part and word classifier training part. The first part uses bilingual corpus which contain proper nouns. For each proper noun, the first part finds out the class which gives the highest sentence-level automatic evaluation score. The second part trains CNN-based word class classifier by using the training data yielded by the first step. The training data consists of source language sentences with proper nouns and the proper nouns’ classes which give the highest scores. The CNN is trained to predict the proper noun class given the source side sentence. Although, the proposed method does not require the manually annotated training data at all, the experimental results on a statistical machine translation system show that the dictionary made by the proposed method achieves comparable performance to the manually annotated dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The actual hyper parameters’ setting is different from the example shown in the figure. Detail setting will be explained in Sect. 4.

References

Okuma, H., Yamamoto, H., Sumita, E.: Introducing a translation dictionary into phrase-based SMT. IEICE Trans. Inf. Syst. 91-D, 2051–2057 (2008)
Google Scholar
Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: Translation estimation for technical terms using corpus collected from the web. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 325–331 (2005)
Google Scholar
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 400–408 (2002)
Google Scholar
Sato, S.: Web-based transliteration of person names. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 273–278 (2009)
Google Scholar
Finch, A., Dixon, P., Sumita, E.: Integrating a joint source channel model into a phrase-based transliteration system. Proc. NEWS 2011, 23–27 (2011)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1064–1074. Association for Computational Linguistics (2016)
Google Scholar
Yasuda, K., Heracleous, P., Ishikawa, A., Hashimoto, M., Matsumoto, K., Sugaya, F.: Building a location dependent dictionary for speech translation systems. In: 18th International Conference on Computational Linguistics and Intelligent Text Processing (2017)
Google Scholar
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Conference on Empirical Methods in Natural Language Processing, pp. 944–952 (2010)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Google Scholar
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Language Process. 22, 1533–1545 (2014)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: Convolutional neural networks for modeling sentences. In: Proceedings of the 52nd Annual Meeting for Computational Linguistics, pp. 655–665 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26, pp 3111–3119. Curran Associates, Inc. (2013)
Google Scholar
Kikui, G., Sumita, E., Takezawa, T., Yamamoto, S.: Creating corpora for speech-to-speech translation. In: 8th European Conference on Speech Communication and Technology (EUROSPEECH), pp. 381–382 (2003)
Google Scholar

Download references

Acknowledgments

This research is supported by Japanese Ministry of Internal Affairs and Communications as a Global Communication Project.

Author information

Authors and Affiliations

KDDI Research, Inc., Garden Air Tower, 3-10-10, Iidabashi, Chiyoda-ku, Tokyo, 102-8460, Japan
Kohichi Takai, Gen Hattori, Keiji Yasuda, Panikos Heracleous, Akio Ishikawa, Kazunori Matsumoto & Fumiaki Sugaya
MINDWORD Inc., 7-19-11 Nishishinjuku, Shinjuku-ku, Tokyo, 160-0023, Japan
Fumiaki Sugaya

Authors

Kohichi Takai
View author publications
You can also search for this author in PubMed Google Scholar
Gen Hattori
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yasuda
View author publications
You can also search for this author in PubMed Google Scholar
Panikos Heracleous
View author publications
You can also search for this author in PubMed Google Scholar
Akio Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Fumiaki Sugaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kohichi Takai .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takai, K. et al. (2023). Automatic Method to Build a Dictionary for Class-Based Translation Systems. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-23793-5_24
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23792-8
Online ISBN: 978-3-031-23793-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Method to Build a Dictionary for Class-Based Translation Systems