Abstract
Chinese sentence segmentation is considered to be a very fundamental step in natural language processing. A successful solution for sentence boundary detection is a key step in the subsequent NLP tasks, such as parsing and machine translation, etc. In this paper, we consider comma as a sign-of-the-sentence boundary, and then divide it into two major types, i.e., the true (EOS) and the pseudo (Non-EOS). Finally, a system framework of Chinese sentence segmentation based on two-layer classifiers is presented and implemented. The experimental results on Chinese Treebank 6.0. Results show that our model achieve the F-measure of 90.7% overall, which improves by 1.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jeffrey, C.R., Adwait, R.: A Maximum Entropy Approach to Identifying Sentence Boundaries. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP), pp. 803–806 (1997)
Junhui, L., Guodong, Z., Qiaoming, Z., Peide, Q.: Syntactic Parsing with Hierarchical Modeling. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 561–566. Springer, Heidelberg (2008)
Qiaoming, Z., Junhui, L., Hongling, W., Guodong, Z.: A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing. In: Proceedings of the 2010 Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 714–724 (2010)
Junhui, L., Guodong, Z., Hongling, W., Qiaoming, Z.: Learning the Scope of Negation via Shallow Semantic Parsing. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 671–679 (2010)
John, A., John, B., David, D., Lynette, H., Patricia, R., Marc, V.: MITRE: Description of the Alembic system used for MUC-6. In: The Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 141–155 (1995)
Neha, A., Kelley, H.F., Max, S.: Sentence boundary detection using a MaxEnt classifier
David, D.P., Marti, A.H.: Adaptive sentence boundary disambiguation. In: The Proceeding of the 1994 Conference on Applied Natural Language Processing (ALNP), pp. 241–267 (1994)
Meixun, J., Miyoung, K., Dong, K., Jong, L.: Segmentation of Chinese Long Sentences Using Commas. In: Proceedings of the SIGHANN Workshop on Chinese Language Processing (2004)
Xing, L., Chengqing, Z., Rile, H.: A Hierarchical Parsing Approach with Punctuation Processing for Long Sentence Sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing: Companion Volume Including Posters/Demos and Tutorial Abstracts
Nianwen, X., Yaqin, Y.: Chinese sentence segmentation as comma classification. In: The Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 631–635 (2010)
Andrew, K.M.: Mallet: A machine learning for language toolkit (2004), http://mallet.cs.umass.edu
Zhou, G.: Direct modeling of output context dependence in discriminative Hidden Markov Model. Pattern Recognition Letters, 545–553 (2005)
Zhou, G.: Discriminative hidden Markov modeling with long state dependence using a kNN ensemble. In: Proceedings of the 20rd International Conference on Computational Linguistics (COLING), pp. 22–28 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, S., Kong, F., Li, P., Zhu, Q. (2013). A Chinese Sentence Segmentation Approach Based on Comma. In: Ji, D., Xiao, G. (eds) Chinese Lexical Semantics. CLSW 2012. Lecture Notes in Computer Science(), vol 7717. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36337-5_82
Download citation
DOI: https://doi.org/10.1007/978-3-642-36337-5_82
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36336-8
Online ISBN: 978-3-642-36337-5
eBook Packages: Computer ScienceComputer Science (R0)