Abstract
This paper describes our system for Chinese word segmentation of micro-blog text, one of the NLPCC-ICCPOL 2016 Shared Tasks [1]. The CRF (Conditional Random Field) model is employed to model word segmentation as a sequence labeling problem, 7 sets of features are selected to train the CRF model. The system achieves \(f_{b}\) 0.798144 on closed track, 0.81968 on semi-open track, and 0.82217 on open track with weighted measures [2].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qiu, X., Qian, P., Shi, Z., Wu, S.: Overview of the NLPCC 2016 Shared Task: Chinese Word Segmentation for Micro-Blog Texts
Qian, P., Qiu, X., Huang, X.: A new psychometric-inspired evaluation metric for chinese word segmentation In: Meeting of the Association for Computational Linguistics (2016)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Jin, K.L., Ng, H.T., Guo, W.: A maximum entropy approach to Chinese word segmentation. In: Proceedings of 4th SIGHAN Workshop on Chinese Language Processing (2005)
Peng, F., Feng, F., Mccallum, A.: Chinese segmentation, new word detection using conditional random fields. In: Proceedings of COLING, pp. 562–568 (2004)
Chen, X., Qiu, X., Zhu, C., et al.: Long short-term memory neural networks for Chinese word segmentation. In: Conference on Empirical Methods in Natural Language Processing (2015)
Zhao, H., Li, M., Lu, B.L., et al.: Effective tag set selection in Chinese word segmentation via conditional random field modeling. In: 20th Pacific Asia Conference on Language, Information, Computation, pp. 87–94 (2006)
Yan, J.: Research and application of Chinese word segmentation based on conditional random fields (2009). (in Chinese)
Gao, Q., Vogel, S.: A multi-layer chinese word segmentation system optimized for out-of-domain tasks (2010)
Wu, G., et al.: Leveraging rich linguistic features for cross-domain Chinese segmentation. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing (2014)
Emerson, T.: The second international Chinese word segmentation bakeoff. In: Proceedings of 4th SIGHAN Workshop on Chinese Language Processing, p. 133 (2005)
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31865-1_25
Acknowledgments
This work was partially supported by Natural Science Foundation of China (No. 61273365), discipline building plan in 111 base (No. B08004) and Engineering Research Center of Information Networks of MOE, and the Co-construction Program with the Beijing Municipal Commission of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Leng, Y., Liu, W., Wang, S., Wang, X. (2016). A Feature-Rich CRF Segmenter for Chinese Micro-Blog. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_78
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)