Abstract
This paper proposes how to collect bilingual technical terms from Japanese-Chinese patent families. In the proposed method, the phrase translation table of a statistical machine translation model is used within the procedure of estimating Japanese-Chinese translation of technical terms. In this procedure, first, we extract Japanese technical terms from the Japanese side of parallel patent sentences. Then, we collect all the sentences that contain the extracted Japanese term. Next, we generate Chinese translation of the Japanese technical term, where we refer to the phrase translation table of a statistical machine translation model. Finally, we apply the Support Vector Machines (SVMs) to the task of identifying bilingual technical terms. As the overall performance, we achieve over 90 % precision with the condition of more than or equal to 60 % recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
We collect Japanese-Chinese bilingual technical term pairs which are generated from an identical Japanese term into one subset. We do not separate them into more than one subsets.
- 4.
References
Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of 51st ACL, pp. 759–764 (2013)
Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: Improving the extraction of bilingual terminology from Wikipedia. ACM Trans. Multimedia Comput. Commun. Appl. 5(4), 31:1–31:17 (2009)
Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from Web corpora. In: Proceedings of HLT/EMNLP, pp. 483–490 (2005)
Itagaki, M., Aikawa, T., He, X.: Automatic validation of terminology translation consistency with statistical method. In: Proceedings of MT Summit XI, pp. 269–274 (2007)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of 45th ACL, Companion Volume, pp. 177–180 (2007)
Lin, D., Zhao, S., Van Durme, B., Paşca, M.: Mining parenthetical translations from the web by word alignment. In: Proceedings of 46th ACL: HLT, pp. 994–1002 (2008)
Lu, B., Tsou, B.K.: Towards bilingual term extraction in comparable patents. In: Proceedings of 23rd PACLIC, pp. 755–762 (2009)
Matsumoto, Y., Utsuro, T.: Lexical knowledge acquisition. In: Dale, R., Moisl, H., Somers, H. (eds.) Handbook of Natural Language Processing, chap. 24, pp. 563–610. Marcel Dekker Inc., New York (2000)
Morin, E., Hazem, A.: Looking at unbalanced specialized comparable corpora for bilingual lexicon extraction. In: Proceedings of 52nd ACL, pp. 1284–1293 (2014)
Morishita, Y., Utsuro, T., Yamamoto, M.: Integrating a phrase-based SMT model and a bilingual lexicon for human in semi-automatic acquisition of technical term translation lexicon. In: Proceedings of 8th AMTA, pp. 153–162 (2008)
Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the web. In: Proceedings of 2nd International Workshop on Web as Corpus, pp. 11–18 (2006)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan bakeoff 2005. In: Proceedings of 4th SIGHAN Workshop on Chinese Language Processing, pp. 168–171 (2005)
Utiyama, M., Isahara, H.: A Japanese-English patent parallel corpus. In: Proceedings of MT Summit XI, pp. 475–482 (2007)
Yasuda, K., Sumita, E.: Building a bilingual dictionary from a Japanese-Chinese patent corpus. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 276–284. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Dong, L., Long, Z., Utsuro, T., Mitsuhashi, T., Yamamoto, M. (2016). Collecting Bilingual Technical Terms from Japanese-Chinese Patent Families by SVM. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_18
Download citation
DOI: https://doi.org/10.1007/978-981-10-0515-2_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0514-5
Online ISBN: 978-981-10-0515-2
eBook Packages: Computer ScienceComputer Science (R0)