Skip to main content

Classifying Commas for Patent Machine Translation

  • Conference paper
  • First Online:
Machine Translation (CWMT 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 668))

Included in the following conference series:

  • 565 Accesses

Abstract

Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.sipo.gov.cn/.

  2. 2.

    http://mallet.cs.umass.edu/.

  3. 3.

    http://research.nii.ac.jp/ntcir/permission/ntcir-9/perm-en-PatentMT.html.

References

  • Jingjing, G., Zhou, G.: Chinese comma classification based on segmentation and part of speech tagging. Comput. Eng. Appl. 51(18), 120–125 (2015). (In Chinese)

    Google Scholar 

  • Jin, M., Kim, M.-Y., Kim, D., Lee, J.-H.: Segmentation of chinese long sentences using commas. In: Proceedings of the SIGHANN Workshop on Chinese Language Processing, pp. 1–8 (2004)

    Google Scholar 

  • Kong, F., Zhou, G.: A clause-level hybrid approach to Chinese empty element recovery. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2113–2119 (2013)

    Google Scholar 

  • Kong, F., Zhou, G.: Chinese comma disambiguation on k-best parse trees. In: Zong, C., Nie, J.-Y., Zhao, D., Feng, Y. (eds.) Proceedings of CCF Conference on Natural Language Processing & Chinese Computing. CCIS, vol. 496, pp. 13–22. Springer, Heidelberg (2014)

    Google Scholar 

  • Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 2001 International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  • Li, H., Zhao, K., Hu, R., Zhu, Y., Jin, Y.: A hybrid system for chinese-english patent machine translation. In: Proceedings of 6th Workshop on Patent and Scientific Literature Translation of MT Summit 2015, pp. 52–67 (2015)

    Google Scholar 

  • Li, H., Zhu, Y., Yang, Y., Jin, Y.: Reordering adverbial chunks in Chinese-english patent machine translation. In: Proceedings of 3rd IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 375–379 (2014)

    Google Scholar 

  • Li, X., Yang, H., Huang, J.P.: Maximum entropy for Chinese comma classification with rich linguistic features. In: Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 11–17 (2014)

    Google Scholar 

  • Li, X., Zong, C., Hu, R.: A hierarchical parsing approach with punctuation processing for long sentence sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing, pp. 17–24 (2005)

    Google Scholar 

  • Li, Y., Feng, W., Zhou, G., Zhu, K.: Research of Chinese clause identification based on comma. Acta Scientiarum Naturalium Universitatis Pekinensis 49(01), 7–14 (2013). (In Chinese)

    Google Scholar 

  • Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The penn discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)

    Google Scholar 

  • Xu, S., Li, P.: Recognizing Chinese elementary discourse unit on comma. In: Proceedings of 2013 International Conference on Asian Language Processing, pp. 3–6 (2013)

    Google Scholar 

  • Xue, N., Yang, Y.: Chinese sentence segmentation as comma classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 631–635 (2011)

    Google Scholar 

  • Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 786–794 (2012)

    Google Scholar 

  • Zhu, Y., Jin, Y.: A method of recognizing the root of an improved dependency tree for the Chinese patent literature. In: Proceedings of IEEE CCIS 2012, p. 1 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzheng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, H., Zhu, Y. (2016). Classifying Commas for Patent Machine Translation. In: Yang, M., Liu, S. (eds) Machine Translation. CWMT 2016. Communications in Computer and Information Science, vol 668. Springer, Singapore. https://doi.org/10.1007/978-981-10-3635-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3635-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3634-7

  • Online ISBN: 978-981-10-3635-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics