Skip to main content

Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora

Exploring Work on Natural Chunk Recognition Using Explicit Boundary Indicators

  • Conference paper
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2013, CCL 2013)

Abstract

Great changes in Natural Language Processing (NLP) research appear with the rapid inflation of corpora scale. NLP based on massive scale natural annotations has become a new research hotspot. We summarized the state of art in NLP based on massive scale natural annotated resource, and proposed a new concept of “Natural Chunk”. In the paper, we analyzed its concept and properties, and conducted experiments on natural chunk recognition, which exhibit the feasibility of natural chunk recognition based on natural annotations. Chinese natural chunk research, as a new research direction in language boundary recognition, has positive influences in Chinese computing and promising future.

Supported by NFSC(61170162), State Language Commission (YB125-42), National Science-technology Support Plan Projects (2012BAH16F00) and the Fundamental Research Funds for the Central Universities(13YCX192).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, C.: Structure and Boundary - A Cognitive Study on Linguistic Expressions. Shanghai Foreign Language Education Press (December 2008)

    Google Scholar 

  2. Feng, S.: The multidimensional properties of “word” in Chinese. Contemporary Linguistics 3(3), 161–174 (2001)

    Google Scholar 

  3. Sun, M.: Natural Language Processing Based on Naturally Annotated Web Resources. Journal of Chinese Information Processing 25(6), 26–32 (2011)

    Google Scholar 

  4. Rao, G., Xun, E.: Word Boundary and Chinese Word Segmentaion. Journal of Beijing University (Natural Science Edition) 49(1) (2013)

    Google Scholar 

  5. Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Computational Linguistics 35(4), 505–512 (2009)

    Article  Google Scholar 

  6. Yang, Y., Lu, Q., Zhao, T.: Chinese Term Extraction Based on Delimiters. In: Conference: Language Resources and Evaluation – LREC (2008)

    Google Scholar 

  7. Li, X., Zong, C.: A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences. Journal of Chinese Information Processing 20(4), 8–15 (2006)

    MathSciNet  Google Scholar 

  8. Chuang, T.C., Yeh, K.C.: Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria. Computational Linguistics and Chinese Language Processing 10(1), 95–122 (2005)

    Google Scholar 

  9. Qian, Y.-L., Xun, E.-D.: Prediction of Speech Pauses Based on Punctuation Information and Statistical Language Model. PR&AI 21(4), 541–545 (2008)

    Google Scholar 

  10. Xun, E.-D., Qian, Y.-L., Guo, Q., Song, R.: Using Binary Tree as Pruning Strategy to identify Rhythm Phrase Breaks. Journal of Chinese Information Processing 20(3), 23–28 (2006)

    Google Scholar 

  11. Spitkovsky, V.I., Jurafsky, D.: Profiting from mark-up: Hypertext annotations for guided parsing. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1278–1287 (2010)

    Google Scholar 

  12. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: Making a Point in Unsupervised Dependency Parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 19–28 (2011)

    Google Scholar 

  13. Sun, W., Xu, J.: Enhancing Chinese Word Segmentation Using Unlabeled Data. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 970–979 (2011)

    Google Scholar 

  14. Zhao, H., Kit, C.: An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework. In: International Joint Conference on Natural Language Processing – IJCNLP 2008 (2008)

    Google Scholar 

  15. Wang, H., Zhu, J., Tang, S., Fan, X.: A New Unsupervised Approach to Word Segmentation. ACL 37(3), 421–454 (2011)

    Google Scholar 

  16. Huan, C.-R., Šimon, P., Hsieh, S.-K., Prévot, L.: Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 69–72 (2007)

    Google Scholar 

  17. Li, S., Huang, C.-R.: Chinese Word Segmentation Based on Word Boundary Decision. Journal of Chinese Information Processing 24(1), 3–7 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, Ze., Xun, Ed., Rao, Gq., Yu, D. (2013). Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41491-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41490-9

  • Online ISBN: 978-3-642-41491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics