Classifying Commas for Patent Machine Translation

Li, Hongzheng; Zhu, Yun

doi:10.1007/978-981-10-3635-4_8

Hongzheng Li¹² &
Yun Zhu¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 668))

Included in the following conference series:

China Workshop on Machine Translation

565 Accesses

Abstract

Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Jingjing, G., Zhou, G.: Chinese comma classification based on segmentation and part of speech tagging. Comput. Eng. Appl. 51(18), 120–125 (2015). (In Chinese)
Google Scholar
Jin, M., Kim, M.-Y., Kim, D., Lee, J.-H.: Segmentation of chinese long sentences using commas. In: Proceedings of the SIGHANN Workshop on Chinese Language Processing, pp. 1–8 (2004)
Google Scholar
Kong, F., Zhou, G.: A clause-level hybrid approach to Chinese empty element recovery. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2113–2119 (2013)
Google Scholar
Kong, F., Zhou, G.: Chinese comma disambiguation on k-best parse trees. In: Zong, C., Nie, J.-Y., Zhao, D., Feng, Y. (eds.) Proceedings of CCF Conference on Natural Language Processing & Chinese Computing. CCIS, vol. 496, pp. 13–22. Springer, Heidelberg (2014)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 2001 International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Li, H., Zhao, K., Hu, R., Zhu, Y., Jin, Y.: A hybrid system for chinese-english patent machine translation. In: Proceedings of 6th Workshop on Patent and Scientific Literature Translation of MT Summit 2015, pp. 52–67 (2015)
Google Scholar
Li, H., Zhu, Y., Yang, Y., Jin, Y.: Reordering adverbial chunks in Chinese-english patent machine translation. In: Proceedings of 3rd IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 375–379 (2014)
Google Scholar
Li, X., Yang, H., Huang, J.P.: Maximum entropy for Chinese comma classification with rich linguistic features. In: Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 11–17 (2014)
Google Scholar
Li, X., Zong, C., Hu, R.: A hierarchical parsing approach with punctuation processing for long sentence sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing, pp. 17–24 (2005)
Google Scholar
Li, Y., Feng, W., Zhou, G., Zhu, K.: Research of Chinese clause identification based on comma. Acta Scientiarum Naturalium Universitatis Pekinensis 49(01), 7–14 (2013). (In Chinese)
Google Scholar
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The penn discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)
Google Scholar
Xu, S., Li, P.: Recognizing Chinese elementary discourse unit on comma. In: Proceedings of 2013 International Conference on Asian Language Processing, pp. 3–6 (2013)
Google Scholar
Xue, N., Yang, Y.: Chinese sentence segmentation as comma classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 631–635 (2011)
Google Scholar
Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 786–794 (2012)
Google Scholar
Zhu, Y., Jin, Y.: A method of recognizing the root of an improved dependency tree for the Chinese patent literature. In: Proceedings of IEEE CCIS 2012, p. 1 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Chinese Information Processing, Beijing Normal University, Beijing, China
Hongzheng Li & Yun Zhu

Authors

Hongzheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzheng Li .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Muyun Yang
Microsoft Research Asia, Beijing, China
Shujie Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Zhu, Y. (2016). Classifying Commas for Patent Machine Translation. In: Yang, M., Liu, S. (eds) Machine Translation. CWMT 2016. Communications in Computer and Information Science, vol 668. Springer, Singapore. https://doi.org/10.1007/978-981-10-3635-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-10-3635-4_8
Published: 06 January 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3634-7
Online ISBN: 978-981-10-3635-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics