Skip to main content

A Chinese New Word Detection Approach Based on Independence Testing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11110))

Abstract

New word detection is of great significance for Chinese text information processing, which directly affects the capabilities of word segmentation, information retrieval and automatic translation. Focusing on the problem of Chinese new word detection, this paper proposes an independence-testing-based detection approach with no need of prior information. The paper analyzes statistical characteristics of new words in Chinese texts, uses statistical hypothesis testing to infer the correlations between adjacent semantic units, and proposes an iterative algorithm to detect new words gradually. Our algorithm is evaluated on both large-scale corpus and short news texts. Experimental results show that this approach can effectively detect new words from all kinds of news.

This work was supported by Fundamental Research Funds for Central Universities (No. BLX2015-17) and National Nature Science Foundation of China (No. 61702025).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Huang, C.N., Hai, Z.: Chinese word segmentation: a decade review. J. Chin. Inf. Process. 21(3), 8–19 (2007)

    Google Scholar 

  2. Zou, G., Liu, Y., Liu, Q.: Internet-oriented Chinese new words detection. J. Chin. Inf. Process. 18(6), 1–9 (2004)

    Google Scholar 

  3. Luo, Z., Song, R.: An integrated method for Chinese unknown word extraction. In: Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, pp. 148–154. Association for Computational Linguistics (2004)

    Google Scholar 

  4. Li, D., Tu, W., Shi, L.: Chinese new word identification algorithm based on context-aware. Comput. Eng. Des. 33(10), 4022–4027 (2012)

    Google Scholar 

  5. Zhang, H., Yong, L.I., Yan, Q.: Method of new Chinese words identification from large scale network corpora. Comput. Eng. Appl. 51(5), 208–213 (2015)

    Google Scholar 

  6. He, M., Gong, C., Zhang, H., Cheng, X.: Method of new word identification based on lager-scale corpus. Comput. Eng. Appl. 43(21), 157–159 (2007)

    Google Scholar 

  7. Zhao, X., Zhang, H.: New words identification based on iterative algorithm. Comput. Eng. 40(7), 154–158 (2014)

    Google Scholar 

  8. Zeng, H.L., Zhou, C.L., Shi, X.D., et al.: New word detection algorithm for Chinese based on extraction of local context information. In: Proceedings of the 3rd International Conference on Intelligent System and Knowledge Engineering, pp. 797–801. IEEE Xplore (2008)

    Google Scholar 

  9. Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 562–568 (2004)

    Google Scholar 

  10. Cui, S.: New word detection based on large-scale corpus. J. Comput. Res. Dev. 43(5), 927–932 (2006)

    Article  Google Scholar 

  11. Zhang, H., Luan, J., Li, Y., Qi, X.: Method of new Chinese word detection based on statistical learning framework. Comput. Sci. 39(2), 232–235 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dongchen Jiang or Xiaoyu Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, D., Chen, X., Yang, X. (2018). A Chinese New Word Detection Approach Based on Independence Testing. In: Fleuriot, J., Wang, D., Calmet, J. (eds) Artificial Intelligence and Symbolic Computation. AISC 2018. Lecture Notes in Computer Science(), vol 11110. Springer, Cham. https://doi.org/10.1007/978-3-319-99957-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99957-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99956-2

  • Online ISBN: 978-3-319-99957-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics