Abstract
This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer, Dordrecht (1991)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, Somerset, New Jersey, Association for Computational Linguistics, pp. 82–94 (1995)
Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, Lisbin, Portugal, pp. 127–132 (2000)
Li, H., Webster, J.J., Kit, C., Yao, T.: Transductive hmm based chinese text chunking. In: Proceedings of IEEE NLP-KE2003, Beijing, China, pp. 257–262 (2003)
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Wu, S.H., Shih, C.W., Wu, C.W., Tsai, T.H., Hsu, W.L.: Applying maximum entropy to robust chinese shallow parsing. In: Proceedings of ROCLING 2005 (2005)
Zhao, T., Yang, M., Liu, F., Yao, J., Yu, H.: Statistics based hybrid approach to chinese base phrase identification. In: Proceedings of Second Chinese Language Processing Workshop (2000)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)
Chen, W., Zhang, Y., Isahara, H.: An empirical study of chinese chunking. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, Association for Computational Linguistics, pp. 97–104 (2006)
Steedman, M., Hwa, R., Clark, S., Osborne, M., Sarkar, A., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Example selection for bootstrapping statistical parsers. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 157–164 (2003)
Pham, T., Ng, H., Lee, W.: Word sense disambiguation with semi-supervised learning. In: AAAI 2005, The Twentieth National Conference on Artificial Intelligence (2005)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL) (1995)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)
Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2005)
Steedman, M., Osborne, M., Sarkar, A., Clark, S., Hwa, R., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Bootstrapping statistical parsers from small datasets. In: The Proceedings of the Annual Meeting of the European Chapter of the ACL, pp. 331–338 (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100 (1998)
Sang, E.F.T.K.: Memory-based shallow parsing. JMLR 2, 559–594 (2002)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003 (2003)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, W., Zhang, Y., Isahara, H. (2006). Chinese Chunking with Tri-training Learning. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_49
Download citation
DOI: https://doi.org/10.1007/11940098_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)