Abstract
Statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive terminology translations. If the data is multi-domain mixed, it is difficult for SMT system to learn translation probability of context-sensitive terminology. However, terminology translation is important for SMT. The previous work mainly focuses on integrating terminology into machine translation systems and heavily relies on domain terminology resources. In this paper, we propose a back translation based method to identify terminology translation errors from SMT outputs and automatically suggest a better translation. Our approach is simple with no external resources and can be applied to any type of SMT system. We use three metrics: tree-edit distance, sentence semantic similarity and language model perplexity to measure the quality of back translation. Experimental results illustrate that our method improves performance on both weak and strong SMT systems, yielding a precision of 0.48% and 1.51% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
LDC2003T09 Gigaword Chinese Text Corpus Second Edition.
- 9.
LDC2009T13 Xinhua News Portion of English Gigaword Second Edition.
References
Axelrod, A., He, X., Gao, J.: Domain adaptation via pseudo in-domain data selection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics, Edinburgh (2011)
Carl, M., Langlais, P.: An intelligent terminology database as a pre-processor for statistical machine translation. In: Second International Workshop on Computational Terminology COLING-02 on COMPUTERM 2002, vol. 14, pp. 1–7. Association for Computational Linguistics (2002)
Skadiņš, R., Pinnis, M., Gornostay, T., Vasiļjevs, A.: Application of online terminology services in statistical machine translation. In: Proceedings of the XIV Machine Translation Summit, MT Summit XIV, France, pp. 281–286 (2013)
Meng, F., Xiong, D., Jiang, W., Liu, Q.: Modeling term translation for document-informed machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 546–556. Association for Computational Linguistics, Doha (2014)
Pinnis, M., Skadiņš, R.: MT adaptation for underresourced domains–what works and what not. In: Proceedings of the 5th International Conference Baltic HLT, p. 176. IOS Press (2012)
Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 47–54. Association for Computational Linguistics, Suntec (2009)
Itagaki, M., Aikawa, T.: Post-MT term swapper: supplementing a statistical machine translation system with a user dictionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation. European Language Resources Association, Marrakech (2008)
Bosca, A., Nikoulina, V., Dymetman, M.: A lightweight terminology verification service for external machine translation engines. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 49–52. Association for Computational Linguistics, Gothenburg (2014)
Xiong, D., Zhang, M., Li, H.: Error detection for statistical machine translation using linguistic features. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 604–611. Association for Computational Linguistics, Uppsala (2010)
Wisniewski, G., Pécheux, N., Allauzen, A.: LIMSI submission for WMT’14 QE task. In: Proceedings of the 9th Workshop on Statistical Machine Translation, pp. 348–354. Association for Computational Linguistics, Baltimore (2014)
José, G.C., de Souza, J.G.-R., Buck, C., Turchi, M., Negri, M.: FBK-UPV-UEdin participation in the WMT14 quality estimation shared-task. In: Proceedings of the 9th Workshop on Statistical Machine Translation, pp. 322–328. Association for Computational Linguistics, Baltimore (2014)
Bille, P.: A survey on tree edit distance and related problems. Theoret. Comput. Sci. 337(1), 217–239 (2005)
Yao, X., Van Durme, B., Callison-Burch, C., Clark, P.: Answer extraction as sequence tagging with tree edit distance. In: Proceedings of North American Chapter of the Association for Computational Linguistics, pp. 9–14. Association for Computational Linguistics Atlanta (2013)
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Liu, L., Hong, Y., Lu, J., Lang, J., Ji, H., Yao, J.M.: An iterative link-based method for parallel web page mining. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1216–1224. Association for Computational Linguistics, Doha (2014)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics, Sapporo (2003)
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649. Association for Computational Linguistics, Atlanta (2013)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics, Edmonton (2003)
Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified Kneser-Ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 690–696. Association for Computational Linguistics, Sofia (2013)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, pp. 674–679. European Language Resources Association, Istanbul (2012)
Acknowledgments
This research work is supported by National Natural Science Foundation of China (Grants No. 61373097, No. 61672367, No. 61672368, No. 61331011), the Research Foundation of the Ministry of Education and China Mobile, MCM20150602 and the Science and Technology Plan of Jiangsu, SBK2015022101. The authors would like to thank the anonymous reviewers for their insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, M., Tang, J., Hong, Y., Yao, J. (2017). Terminology Translation Error Identification and Correction. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-6805-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)