Abstract
To automatically extract Chinese collocations and build a large-scale collocation bank, we are developing a one-million-word Chinese shallow parsed treebank. The treebank can be used not only as a training set for our shallow parser, but also as processed data from which collocations are extracted. This paper presents several issues related to this on-going project, such as our definition of shallow parsing used in Chinese collocation extraction, guideline preparation, and quality control.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Christopher D. Manning, Hinrich Schutze: Foundations of Statistical Natural Language Processing. MIT Press (1999)
Smadja F.: Retrieving Collocations from text: Xtract, Computational Linguistics. 19:1(1994) 143–177
Fei Xia, et al.: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In the Proceedings of the second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece (2000)
Keh-Jiann Chen, et al.: the CKIP Chinese Treebank: Guidelines for Annotation. In: Building and Using Syntactically Annoted Corpora, Dordrecht: Kluwer (2000)
Yu Shiwen, et al.: the Grammatical Knowledge-base of Contemporary Chinese: a Complete Specification. Beijing: Tsinghua University Press (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baoli, L., Qin, L., Yin, L. (2003). Building a Chinese Shallow Parsed TreeBank for Collocation Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_41
Download citation
DOI: https://doi.org/10.1007/3-540-36456-0_41
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive