Chinese Chunking with Tri-training Learning

Chen, Wenliang; Zhang, Yujie; Isahara, Hitoshi

doi:10.1007/11940098_49

Chinese Chunking with Tri-training Learning

Wenliang Chen^22,23,
Yujie Zhang²² &
Hitoshi Isahara²²

Conference paper

1025 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Abstract

This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer, Dordrecht (1991)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, Somerset, New Jersey, Association for Computational Linguistics, pp. 82–94 (1995)
Google Scholar
Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, Lisbin, Portugal, pp. 127–132 (2000)
Google Scholar
Li, H., Webster, J.J., Kit, C., Yao, T.: Transductive hmm based chinese text chunking. In: Proceedings of IEEE NLP-KE2003, Beijing, China, pp. 257–262 (2003)
Google Scholar
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Chapter Google Scholar
Wu, S.H., Shih, C.W., Wu, C.W., Tsai, T.H., Hsu, W.L.: Applying maximum entropy to robust chinese shallow parsing. In: Proceedings of ROCLING 2005 (2005)
Google Scholar
Zhao, T., Yang, M., Liu, F., Yao, J., Yu, H.: Statistics based hybrid approach to chinese base phrase identification. In: Proceedings of Second Chinese Language Processing Workshop (2000)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)
Article Google Scholar
Chen, W., Zhang, Y., Isahara, H.: An empirical study of chinese chunking. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, Association for Computational Linguistics, pp. 97–104 (2006)
Google Scholar
Steedman, M., Hwa, R., Clark, S., Osborne, M., Sarkar, A., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Example selection for bootstrapping statistical parsers. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 157–164 (2003)
Google Scholar
Pham, T., Ng, H., Lee, W.: Word sense disambiguation with semi-supervised learning. In: AAAI 2005, The Twentieth National Conference on Artificial Intelligence (2005)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL) (1995)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)
Google Scholar
Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2005)
Google Scholar
Steedman, M., Osborne, M., Sarkar, A., Clark, S., Hwa, R., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Bootstrapping statistical parsers from small datasets. In: The Proceedings of the Annual Meeting of the European Chapter of the ACL, pp. 331–338 (2003)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100 (1998)
Google Scholar
Sang, E.F.T.K.: Memory-based shallow parsing. JMLR 2, 559–594 (2002)
Article MATH Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003 (2003)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Linguistics Group, National Institute of Information and Communications Technology, 3-5 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Wenliang Chen, Yujie Zhang & Hitoshi Isahara
Natural Language Processing Lab, Northeastern University, Shenyang, 110004, China
Wenliang Chen

Authors

Wenliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Isahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, W., Zhang, Y., Isahara, H. (2006). Chinese Chunking with Tri-training Learning. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_49

Download citation

DOI: https://doi.org/10.1007/11940098_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics