The Application of Kalman Filter Based Human-Computer Learning Model to Chinese Word Segmentation

Zhu, Weimeng; Sun, Ni; Zou, Xiaojun; Hu, Junfeng

doi:10.1007/978-3-642-37247-6_18

Weimeng Zhu¹⁷,
Ni Sun¹⁸,
Xiaojun Zou¹⁸ &
…
Junfeng Hu^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2238 Accesses
1 Citations

Abstract

This paper presents a human-computer interaction learning model for segmenting Chinese texts depending upon neither lexicon nor any annotated corpus. It enables users to add language knowledge to the system by directly intervening the segmentation process. Within limited times of user intervention, a segmentation result that fully matches the use (or with an accurate rate of 100% by manual judgement) is returned. A Kalman filter based model is adopted to learn and estimate the intention of users quickly and precisely from their interventions to reduce system prediction error hereafter. Experiments show that it achieves an encouraging performance in saving human effort and the segmenter with knowledge learned from users outperforms the baseline model by about 10% in segmenting homogenous texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liang, N.Y.: CDWS: An Automatic Word Segmentation System for Written Chinese Texts. Journal of Chinese Information Processing 1 (1987) (in Chinese)
Google Scholar
Nie, J.Y., Jin, W., Hannan, M.L.: A Hybrid Approach to Unknown Word Detection and Segmentation of Chinese. In: Proceedings of the International Conference on Chinese Computing, pp. 326–335 (1994)
Google Scholar
Wu, Z.: LDC Chinese Segmenter, http://www.ldc.upenn.edu/Projects/Chinese/segmenter/mansegment.perl
Luo, X., Sun, M., Tsou, B.K.: Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information. In: COLING 2002, pp. 1–7 (2002)
Google Scholar
Li, M., Gao, J., Huang, C.N., Li, J.: Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, pp. 1–7 (2003)
Google Scholar
Sun, C., Huang, C.N., Guan, Y.: Combinative Ambiguity String Detection and Resolution Based on Annotated Corpus. In: Proceedings of the 3rd Student Workshop on Computational Linguistics (2006)
Google Scholar
Sun, M.S., Shen, D.Y., Tsou, B.K.: Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data. In: COLING/ACL 1998, pp. 1265–1271 (1998)
Google Scholar
Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual Dependencies in Unsupervised Word Segmentation. In: COLING/ACL 2006, pp. 673–680 (2006)
Google Scholar
Xue, N.: Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing 8, 29–48 (2003)
Google Scholar
Zhang, H., Liu, Q., Cheng, X., Zhang, H., Yu, H.: Chinese Lexical Analysis Using Hierarchical Hidden Markov Model. In: Proceedings of the Second SIGHAN Workshop, pp. 63–70 (2003)
Google Scholar
Peng, F., Feng, F., Mcallum, A.: Chinese Segmentation and New Word Detection Using Conditional Random Fields. In: COLING 2004, pp. 23–27 (2004)
Google Scholar
Wang, Z., Araki, K., Tochinai, K.: A Word Segmentation Method with Dynamic Adapting to Text Using Inductive Learning. In: Proceedings of the First SIGHAN Workshop on Chinese Language, vol. 18, pp. 1–5 (2002)
Google Scholar
Li, B., Chen, X.H.: A Human-Computuer Interaction Word Segmentation Method Adapting to Chinese Unknown Texts. Journal of Chinese Information Processing 21 (2007) (in Chinese)
Google Scholar
Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Association for Computational Linguistics 22, 377–404 (1996)
Google Scholar
Sproat, R., Shih, C.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4, 336–351 (1990)
Google Scholar
Chien, L.F.: Pat-Tree-Based Keyword Extraction for Chinese Information Retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–58 (1997)
Google Scholar
Zhang, J., Gao, J., Zhou, M.: Extraction of Chinese Compound Words–an Experimental Study on a Very Large Corpus. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 132–139 (2000)
Google Scholar
Yamamoto, M., Church, K.W.: Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus. Computational Linguistics 27, 1–30 (2001)
Article Google Scholar
Sun, M., Xiao, M., Tsou, B.K.: Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy. Chinese Journal of Computers 6, 736–742 (2004)
Google Scholar
Kit, C., Wilks, Y.: Unsupervised Learning of Word Boundary with Description Length Gain. In: Proceedings of the CoNLL 1999 ACL Workshop, pp. 1–6 (1999)
Google Scholar
Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor Variety Criteria for Chinese Word Extraction. Computational Linguistics 30, 75–93 (2004)
Article Google Scholar
Jin, Z., Tanaka-Ishii, K.: Unsupervised Segmentation of Chinese Text by Use of Branching Entropy. In: COLING/ACL 2006, pp. 428–435 (2006)
Google Scholar
Harris, Z.S.: Morpheme Boundaries within Words. In: Papers in Structural and Transformational Linguistics, pp. 68–77 (1970)
Google Scholar
Feng, C., Chen, Z.X., Huang, H.Y., Guan, Z.Z.: Active Learning in Chinese Word Segmentation Based on Multigram Language Model. Journal of Chinese Information Processing 1 (2004) (in Chinese)
Google Scholar
Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Transaction of the ASME-Journal of Basic Engineering, 35–45 (1960)
Google Scholar
Agarwal, D., Chen, B., Elango, P., Motgi, N., Park, S., Ramakrishnan, R., Roy, S., Zachariah, J.: Online Models for Content Optimization. Advances in Neural Information Processing Systems 21, 17–24 (2009)
Google Scholar
Chu, W., Park, S.T.: Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models. In: Proc. of the 18th International World Wide Web Conference, pp. 691–700 (2009)
Google Scholar
Tong, Y.: Chinese Word Segmentation Based on Statistical Method with General Dictionary and Component Information. Bachelor Degree Thesis. Peking University (2012)
Google Scholar
Odelson, B.J., Rajamani, M.R., Rawlings, J.B.: A New Autocovariance Least-Squares Method for Estimating Noise Covariances. Automatica 42, 303–308 (2006)
Article MathSciNet MATH Google Scholar
Åkesson, B.M., Jørgensen, J.B., Poulsen, N.K., Jørgensen, S.B.: A Generalized Autocovariance Least-Squares Method for Kalman Filter Tuning. Journal of Process Control 18, 769–779 (2008)
Article Google Scholar
Rajamani, M.R., Rawlings, J.B.: Estimation of the Disturbance Structure from Data Using Semidefinite Programming and Optimal Weighting. Automatica 45, 142–148 (2009)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering & Computer Science, Peking University, Beijing, 100871, P.R. China
Weimeng Zhu & Junfeng Hu
Key Laboratory of Computational Linguistics, Ministry of Education, Peking University, Beijing, 100871, P.R. China
Ni Sun, Xiaojun Zou & Junfeng Hu

Authors

Weimeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ni Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, W., Sun, N., Zou, X., Hu, J. (2013). The Application of Kalman Filter Based Human-Computer Learning Model to Chinese Word Segmentation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics