Skip to main content

Word Segmentation of Micro Blogs with Bagging

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9362))

Abstract

This paper describes the model we designed for the Chinese word segmentation Task of NLPCC 2015. We firstly apply a word-based perceptron algorithm to build the base segmenter. Then, we use a Bootstrap Aggregating model of bagging which improves the segmentation results consistently on the three tracks of closed, semi-open and open test. Considering the characteristics of Weibo text, we also perform rule-based adaptation before decoding. Finally, our model achieves F-score 95.12% on closed track, 95.3% on semi-open track and 96.09% on open track.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, Y., Clark, S.: Syntactic processing using the generalized perceptron and beam search. Computational Linguistics 37(1), 105–151 (2011)

    Article  Google Scholar 

  2. Zhang, Y., Clark, S.: Chinese segmentation with a word-based perceptron algorithm. In: Proceedings of ACL, Prague, pp. 840–847 (2007)

    Google Scholar 

  3. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of EMNLP, Philadelphia, PA, pp. 1–8 (2002)

    Google Scholar 

  4. Zhang, K., Sun, M., Zhou, C.: Word segmentation on Chinese mirco-blog data with a linear-time incremental model. In: Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (2012)

    Google Scholar 

  5. Xue, N.: Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing 8(1) (2003)

    Google Scholar 

  6. Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor variety criteria for Chinese word extraction. Computational Linguistics 30(1), 75–93 (2004)

    Article  Google Scholar 

  7. Sun, W., Xu, J.: Enhancing Chinese word segmentation using unlabeled data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2011)

    Google Scholar 

  8. Sun, W.: Word-based and character-based word segmentation models: comparison and combination. In: Coling 2010: Posters, Beijing, China, August, pp. 1211–1219. Coling 2010 Organizing Committee (2010)

    Google Scholar 

  9. Liu, Y., Che, W.: Micro blogs oriented word segmentation system. In: Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin-Yu Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, Z., Dai, XY., Shen, S., Huang, S., Chen, J. (2015). Word Segmentation of Micro Blogs with Bagging. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25207-0_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25206-3

  • Online ISBN: 978-3-319-25207-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics