Abstract
Word segmentation is the first step in Chinese natural language processing. The accuracy of segmentation has substantial impacts on subsequent tasks such as part-of-speech tagging, semantic analysis, etc. This research explores the Chinese word segmentation based on Conditional Random Fields (CRFs). First of all, we apply different character templates to conduct feature selection. Baseline and eight templates are combined to construct nine groups of features. Also, Tsai’s feature set and our proposed feature set are raised to refine the experimental results. Moreover, several parameters of model are adjusted to make the CRF model more effective and efficient. Finally, different tag set of characters are verified according to different dataset. The 2-Tag set is chosen for character tagging in this research. Experimental results demonstrate that the CRF-based Chinese word segmentation with proposed feature set achieves the best F score.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chen, X., Qiu, X., Zhu, C., Huang, X.: Gated recursive neural network for Chinese word segmentation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1744–1753 (2015)
Yao, Y., Huang, Z.: Bi-directional LSTM recurrent neural network for Chinese word segmentation. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 345–353. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_42
Wang, X., Li, C., Chen, J.: Chinese word segmentation method based on expanded convolutional neural network model. J. Chin. Inform. 33(9), 154–161 (2019)
Xue, N.: Chinese word segmentation as character tagging. Comput. Linguist. Chin. Lang. 8(1), 29–47 (2003)
Low, J., Ng, H., Guo, W.: A maximum entropy approach to Chinese word segmentation. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, pp. 161–164 (2005)
Zhao, H., Huang, C., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)
Mao, X., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing (2008)
Tsai, R., et al.: On closed task of Chinese word segmentation: an improved CRF model coupled with character clustering and automatically generated template matching. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 108–117 (2006)
Huang, W., et al.: Toward fast and accurate neural Chinese word segmentation with multi-criteria learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2062–2072 (2020)
Deng, L., Luo, Z.: Domain adaptation of Chinese word segmentation on semi-supervised conditional random fields. J. Chin. Inform. 31(4), 9–19 (2017)
Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML) (2002)
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING, pp. 562–568 (2004)
Huang, C., Zhao, H.: Chinese word segmentation: a decade review. J. Chin. Inform. Process.. 21(3), 8–19 (2007)
Acknowledgement
This work was supported by the High-level Innovation and Entrepreneurship Talents Introduction Program of Jiangsu Province of China, 2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, C., Li, Y. (2021). Research on Chinese Word Segmentation Based on Conditional Random Fields. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12837. Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-84529-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84528-5
Online ISBN: 978-3-030-84529-2
eBook Packages: Computer ScienceComputer Science (R0)