Abstract
New word detection is of great significance for Chinese text information processing, which directly affects the capabilities of word segmentation, information retrieval and automatic translation. Focusing on the problem of Chinese new word detection, this paper proposes an independence-testing-based detection approach with no need of prior information. The paper analyzes statistical characteristics of new words in Chinese texts, uses statistical hypothesis testing to infer the correlations between adjacent semantic units, and proposes an iterative algorithm to detect new words gradually. Our algorithm is evaluated on both large-scale corpus and short news texts. Experimental results show that this approach can effectively detect new words from all kinds of news.
This work was supported by Fundamental Research Funds for Central Universities (No. BLX2015-17) and National Nature Science Foundation of China (No. 61702025).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, C.N., Hai, Z.: Chinese word segmentation: a decade review. J. Chin. Inf. Process. 21(3), 8–19 (2007)
Zou, G., Liu, Y., Liu, Q.: Internet-oriented Chinese new words detection. J. Chin. Inf. Process. 18(6), 1–9 (2004)
Luo, Z., Song, R.: An integrated method for Chinese unknown word extraction. In: Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, pp. 148–154. Association for Computational Linguistics (2004)
Li, D., Tu, W., Shi, L.: Chinese new word identification algorithm based on context-aware. Comput. Eng. Des. 33(10), 4022–4027 (2012)
Zhang, H., Yong, L.I., Yan, Q.: Method of new Chinese words identification from large scale network corpora. Comput. Eng. Appl. 51(5), 208–213 (2015)
He, M., Gong, C., Zhang, H., Cheng, X.: Method of new word identification based on lager-scale corpus. Comput. Eng. Appl. 43(21), 157–159 (2007)
Zhao, X., Zhang, H.: New words identification based on iterative algorithm. Comput. Eng. 40(7), 154–158 (2014)
Zeng, H.L., Zhou, C.L., Shi, X.D., et al.: New word detection algorithm for Chinese based on extraction of local context information. In: Proceedings of the 3rd International Conference on Intelligent System and Knowledge Engineering, pp. 797–801. IEEE Xplore (2008)
Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 562–568 (2004)
Cui, S.: New word detection based on large-scale corpus. J. Comput. Res. Dev. 43(5), 927–932 (2006)
Zhang, H., Luan, J., Li, Y., Qi, X.: Method of new Chinese word detection based on statistical learning framework. Comput. Sci. 39(2), 232–235 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, D., Chen, X., Yang, X. (2018). A Chinese New Word Detection Approach Based on Independence Testing. In: Fleuriot, J., Wang, D., Calmet, J. (eds) Artificial Intelligence and Symbolic Computation. AISC 2018. Lecture Notes in Computer Science(), vol 11110. Springer, Cham. https://doi.org/10.1007/978-3-319-99957-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-99957-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99956-2
Online ISBN: 978-3-319-99957-9
eBook Packages: Computer ScienceComputer Science (R0)