Abstract
Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jingjing, G., Zhou, G.: Chinese comma classification based on segmentation and part of speech tagging. Comput. Eng. Appl. 51(18), 120–125 (2015). (In Chinese)
Jin, M., Kim, M.-Y., Kim, D., Lee, J.-H.: Segmentation of chinese long sentences using commas. In: Proceedings of the SIGHANN Workshop on Chinese Language Processing, pp. 1–8 (2004)
Kong, F., Zhou, G.: A clause-level hybrid approach to Chinese empty element recovery. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2113–2119 (2013)
Kong, F., Zhou, G.: Chinese comma disambiguation on k-best parse trees. In: Zong, C., Nie, J.-Y., Zhao, D., Feng, Y. (eds.) Proceedings of CCF Conference on Natural Language Processing & Chinese Computing. CCIS, vol. 496, pp. 13–22. Springer, Heidelberg (2014)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 2001 International Conference on Machine Learning, pp. 282–289 (2001)
Li, H., Zhao, K., Hu, R., Zhu, Y., Jin, Y.: A hybrid system for chinese-english patent machine translation. In: Proceedings of 6th Workshop on Patent and Scientific Literature Translation of MT Summit 2015, pp. 52–67 (2015)
Li, H., Zhu, Y., Yang, Y., Jin, Y.: Reordering adverbial chunks in Chinese-english patent machine translation. In: Proceedings of 3rd IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 375–379 (2014)
Li, X., Yang, H., Huang, J.P.: Maximum entropy for Chinese comma classification with rich linguistic features. In: Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 11–17 (2014)
Li, X., Zong, C., Hu, R.: A hierarchical parsing approach with punctuation processing for long sentence sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing, pp. 17–24 (2005)
Li, Y., Feng, W., Zhou, G., Zhu, K.: Research of Chinese clause identification based on comma. Acta Scientiarum Naturalium Universitatis Pekinensis 49(01), 7–14 (2013). (In Chinese)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The penn discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)
Xu, S., Li, P.: Recognizing Chinese elementary discourse unit on comma. In: Proceedings of 2013 International Conference on Asian Language Processing, pp. 3–6 (2013)
Xue, N., Yang, Y.: Chinese sentence segmentation as comma classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 631–635 (2011)
Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 786–794 (2012)
Zhu, Y., Jin, Y.: A method of recognizing the root of an improved dependency tree for the Chinese patent literature. In: Proceedings of IEEE CCIS 2012, p. 1 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, H., Zhu, Y. (2016). Classifying Commas for Patent Machine Translation. In: Yang, M., Liu, S. (eds) Machine Translation. CWMT 2016. Communications in Computer and Information Science, vol 668. Springer, Singapore. https://doi.org/10.1007/978-981-10-3635-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-3635-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3634-7
Online ISBN: 978-981-10-3635-4
eBook Packages: Computer ScienceComputer Science (R0)