Abstract
Most neural machine translation (NMT) models normally take the subword-level sequence as input to address the problem of representing out-of-vocabulary words (OOVs). However, using subword units as input may omit the information carried by larger text granularity, such as named entities, which leads to a loss of important semantic information. In this paper, we propose a simple but effective method to incorporate the named entity (NE) tags information into the Transformer translation system. The encoder of our proposed model takes both the subwords and the NE tags of source sentences as inputs. Furthermore, we introduce a novel entity-aligned attention mechanism to make full use of the chunk information of NE tags. The proposed approach can be easily integrated into the existing framework of Transformer. Experimental results on two public translation tasks demonstrate that our proposed method can achieve significant translation improvements over the basic Transformer model and also outperforms the existing competitive systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR abs/1607.06450 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Conference of European Association for Machine Translation, pp. 261–268 (2012)
Chen, H., Huang, S., Chiang, D., Dai, X., Chen, J.: Combining character and word information in neural machine translation using a multi-level attention. In: NAACL-HLT, pp. 1284–1293. Association for Computational Linguistics (2018)
Diao, S., Bai, J., Song, Y., Zhang, T., Wang, Y.: ZEN: pre-training Chinese text encoder enhanced by N-gram representations. CoRR abs/1911.00720 (2019)
Gülçehre, Ç., Ahn, S., Nallapati, R., Zhou, B., Bengio, Y.: Pointing the unknown words. In: ACL, vol. 1, The Association for Computer Linguistics (2016)
Hasler, E., de Gispert, A., Iglesias, G., Byrne, B.: Neural machine translation decoding with terminology constraints. In: NAACL-HLT, vol. 2, pp. 506–512. Association for Computational Linguistics (2018)
Huck, M., Hangya, V., Fraser, A.M.: Better OOV translation with bilingual terminology mining. In: ACL, vol. 1, pp. 5809–5815. Association for Computational Linguistics (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: HLT-NAACL, The Association for Computational Linguistics (2003)
Li, Y., Yu, B., Xue, M., Liu, T.: Enhancing pre-trained Chinese character representation with word-aligned attention. CoRR abs/1911.02821 (2019)
Lopez, A.: Statistical machine translation. ACM Comput. Surv. 40(3), 8:1–8:49 (2008)
Meng, F., Zhang, J.: DTMT: a novel deep transition architecture for neural machine translation. In: AAAI, pp. 224–231. AAAI Press (2019)
Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: WMT, pp. 83–91. The Association for Computer Linguistics (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL, vol. 1. The Association for Computer Linguistics (2016)
So, D.R., Le, Q.V., Liang, C.: The evolved transformer. In: Proceedings of Machine Learning Research, PMLR, ICML, vol. 97, pp. 5877–5886 (2019)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Ugawa, A., Tamura, A., Ninomiya, T., Takamura, H., Okumura, M.: Neural machine translation incorporating named entity. In: COLING, pp. 3240–3250. Association for Computational Linguistics (2018)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS, pp. 2692–2700 (2015)
Wang, X., Tu, Z., Xiong, D., Zhang, M.: Translating phrases in neural machine translation. In: EMNLP, pp. 1421–1431. Association for Computational Linguistics (2017)
Xiao, F., Li, J., Zhao, H., Wang, R., Chen, K.: Lattice-based transformer encoder for neural machine translation. In: ACL, vol. 1, pp. 3090–3097. Association for Computational Linguistics (2019)
Yu, D., Wang, H., Chen, P., Wei, Z.: Mixed pooling for convolutional neural networks. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 364–375. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_34
Acknowledgments
This research work has been funded by the National Key Research and Development Program of China NO. 2016QY03D0604 and NO. 2018YFC0830803, the National Natural Science Foundation of China (Grant No. 61772337).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, L., Lu, W., Zhou, J., Meng, K., Liu, G. (2020). Incorporating Named Entity Information into Neural Machine Translation. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-60450-9_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)