Skip to main content

Research on Chinese Word Segmentation Based on Conditional Random Fields

  • Conference paper
  • First Online:
  • 1299 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12837))

Abstract

Word segmentation is the first step in Chinese natural language processing. The accuracy of segmentation has substantial impacts on subsequent tasks such as part-of-speech tagging, semantic analysis, etc. This research explores the Chinese word segmentation based on Conditional Random Fields (CRFs). First of all, we apply different character templates to conduct feature selection. Baseline and eight templates are combined to construct nine groups of features. Also, Tsai’s feature set and our proposed feature set are raised to refine the experimental results. Moreover, several parameters of model are adjusted to make the CRF model more effective and efficient. Finally, different tag set of characters are verified according to different dataset. The 2-Tag set is chosen for character tagging in this research. Experimental results demonstrate that the CRF-based Chinese word segmentation with proposed feature set achieves the best F score.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, X., Qiu, X., Zhu, C., Huang, X.: Gated recursive neural network for Chinese word segmentation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1744–1753 (2015)

    Google Scholar 

  2. Yao, Y., Huang, Z.: Bi-directional LSTM recurrent neural network for Chinese word segmentation. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 345–353. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_42

    Chapter  Google Scholar 

  3. Wang, X., Li, C., Chen, J.: Chinese word segmentation method based on expanded convolutional neural network model. J. Chin. Inform. 33(9), 154–161 (2019)

    Google Scholar 

  4. Xue, N.: Chinese word segmentation as character tagging. Comput. Linguist. Chin. Lang. 8(1), 29–47 (2003)

    Google Scholar 

  5. Low, J., Ng, H., Guo, W.: A maximum entropy approach to Chinese word segmentation. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, pp. 161–164 (2005)

    Google Scholar 

  6. Zhao, H., Huang, C., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)

    Google Scholar 

  7. Mao, X., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing (2008)

    Google Scholar 

  8. Tsai, R., et al.: On closed task of Chinese word segmentation: an improved CRF model coupled with character clustering and automatically generated template matching. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 108–117 (2006)

    Google Scholar 

  9. Huang, W., et al.: Toward fast and accurate neural Chinese word segmentation with multi-criteria learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2062–2072 (2020)

    Google Scholar 

  10. Deng, L., Luo, Z.: Domain adaptation of Chinese word segmentation on semi-supervised conditional random fields. J. Chin. Inform. 31(4), 9–19 (2017)

    Google Scholar 

  11. Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML) (2002)

    Google Scholar 

  12. Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING, pp. 562–568 (2004)

    Google Scholar 

  13. Huang, C., Zhao, H.: Chinese word segmentation: a decade review. J. Chin. Inform. Process.. 21(3), 8–19 (2007)

    MathSciNet  Google Scholar 

Download references

Acknowledgement

This work was supported by the High-level Innovation and Entrepreneurship Talents Introduction Program of Jiangsu Province of China, 2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, C., Li, Y. (2021). Research on Chinese Word Segmentation Based on Conditional Random Fields. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12837. Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84529-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-84528-5

  • Online ISBN: 978-3-030-84529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics