Terminology Translation Error Identification and Correction

Liu, Mengyi; Tang, Jian; Hong, Yu; Yao, Jianmin

doi:10.1007/978-981-10-6805-8_12

Mengyi Liu¹⁵,
Jian Tang¹⁵,
Yu Hong¹⁵ &
…
Jianmin Yao¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

Chinese National Conference on Social Media Processing

1862 Accesses
1 Citations

Abstract

Statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive terminology translations. If the data is multi-domain mixed, it is difficult for SMT system to learn translation probability of context-sensitive terminology. However, terminology translation is important for SMT. The previous work mainly focuses on integrating terminology into machine translation systems and heavily relies on domain terminology resources. In this paper, we propose a back translation based method to identify terminology translation errors from SMT outputs and automatically suggest a better translation. Our approach is simple with no external resources and can be applied to any type of SMT system. We use three metrics: tree-edit distance, sentence semantic similarity and language model perplexity to measure the quality of back translation. Experimental results illustrate that our method improves performance on both weak and strong SMT systems, yielding a precision of 0.48% and 1.51% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://translate.google.cn.
2.
http://www.statmt.org/moses.
3.
http://fanyi.youdao.com/openapi?path=data-mode.
4.
https://nlp.stanford.edu/software/nndep.shtml.
5.
https://docs.python.org/3/library/urllib.html.
6.
https://www.crummy.com/software/BeautifulSoup/.
7.
http://opennlp.apache.org/.
8.
LDC2003T09 Gigaword Chinese Text Corpus Second Edition.
9.
LDC2009T13 Xinhua News Portion of English Gigaword Second Edition.

References

Axelrod, A., He, X., Gao, J.: Domain adaptation via pseudo in-domain data selection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics, Edinburgh (2011)
Google Scholar
Carl, M., Langlais, P.: An intelligent terminology database as a pre-processor for statistical machine translation. In: Second International Workshop on Computational Terminology COLING-02 on COMPUTERM 2002, vol. 14, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Skadiņš, R., Pinnis, M., Gornostay, T., Vasiļjevs, A.: Application of online terminology services in statistical machine translation. In: Proceedings of the XIV Machine Translation Summit, MT Summit XIV, France, pp. 281–286 (2013)
Google Scholar
Meng, F., Xiong, D., Jiang, W., Liu, Q.: Modeling term translation for document-informed machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 546–556. Association for Computational Linguistics, Doha (2014)
Google Scholar
Pinnis, M., Skadiņš, R.: MT adaptation for underresourced domains–what works and what not. In: Proceedings of the 5th International Conference Baltic HLT, p. 176. IOS Press (2012)
Google Scholar
Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 47–54. Association for Computational Linguistics, Suntec (2009)
Google Scholar
Itagaki, M., Aikawa, T.: Post-MT term swapper: supplementing a statistical machine translation system with a user dictionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation. European Language Resources Association, Marrakech (2008)
Google Scholar
Bosca, A., Nikoulina, V., Dymetman, M.: A lightweight terminology verification service for external machine translation engines. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 49–52. Association for Computational Linguistics, Gothenburg (2014)
Google Scholar
Xiong, D., Zhang, M., Li, H.: Error detection for statistical machine translation using linguistic features. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 604–611. Association for Computational Linguistics, Uppsala (2010)
Google Scholar
Wisniewski, G., Pécheux, N., Allauzen, A.: LIMSI submission for WMT’14 QE task. In: Proceedings of the 9th Workshop on Statistical Machine Translation, pp. 348–354. Association for Computational Linguistics, Baltimore (2014)
Google Scholar
José, G.C., de Souza, J.G.-R., Buck, C., Turchi, M., Negri, M.: FBK-UPV-UEdin participation in the WMT14 quality estimation shared-task. In: Proceedings of the 9th Workshop on Statistical Machine Translation, pp. 322–328. Association for Computational Linguistics, Baltimore (2014)
Google Scholar
Bille, P.: A survey on tree edit distance and related problems. Theoret. Comput. Sci. 337(1), 217–239 (2005)
Article MATH MathSciNet Google Scholar
Yao, X., Van Durme, B., Callison-Burch, C., Clark, P.: Answer extraction as sequence tagging with tree edit distance. In: Proceedings of North American Chapter of the Association for Computational Linguistics, pp. 9–14. Association for Computational Linguistics Atlanta (2013)
Google Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Google Scholar
Liu, L., Hong, Y., Lu, J., Lang, J., Ji, H., Yao, J.M.: An iterative link-based method for parallel web page mining. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1216–1224. Association for Computational Linguistics, Doha (2014)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics, Sapporo (2003)
Google Scholar
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649. Association for Computational Linguistics, Atlanta (2013)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics, Edmonton (2003)
Google Scholar
Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified Kneser-Ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 690–696. Association for Computational Linguistics, Sofia (2013)
Google Scholar
Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, pp. 674–679. European Language Resources Association, Istanbul (2012)
Google Scholar

Download references

Acknowledgments

This research work is supported by National Natural Science Foundation of China (Grants No. 61373097, No. 61672367, No. 61672368, No. 61331011), the Research Foundation of the Ministry of Education and China Mobile, MCM20150602 and the Science and Technology Plan of Jiangsu, SBK2015022101. The authors would like to thank the anonymous reviewers for their insightful comments and suggestions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Mengyi Liu, Jian Tang, Yu Hong & Jianmin Yao

Authors

Mengyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Hong
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Hong .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xueqi Cheng
Beijing Jinri Toutiao Technology Co. Ltd , Beijing, China
Weiying Ma
Arizona State University , Tempe, Arizona, USA
Huan Liu
Institute of Computing Technology, Chinese Academy of Sciences , Beijing, China
Huawei Shen
Renmin University of China , Beijing, China
Shizheng Feng
Microsoft Asia Research , Beijing, China
Xing Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, M., Tang, J., Hong, Y., Yao, J. (2017). Terminology Translation Error Identification and Correction. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_12

Download citation

DOI: https://doi.org/10.1007/978-981-10-6805-8_12
Published: 26 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics