Skip to main content

Chinese Chunking with Tri-training Learning

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Abstract

This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.P.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer, Dordrecht (1991)

    Google Scholar 

  2. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, Somerset, New Jersey, Association for Computational Linguistics, pp. 82–94 (1995)

    Google Scholar 

  3. Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, Lisbin, Portugal, pp. 127–132 (2000)

    Google Scholar 

  4. Li, H., Webster, J.J., Kit, C., Yao, T.: Transductive hmm based chinese text chunking. In: Proceedings of IEEE NLP-KE2003, Beijing, China, pp. 257–262 (2003)

    Google Scholar 

  5. Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Wu, S.H., Shih, C.W., Wu, C.W., Tsai, T.H., Hsu, W.L.: Applying maximum entropy to robust chinese shallow parsing. In: Proceedings of ROCLING 2005 (2005)

    Google Scholar 

  7. Zhao, T., Yang, M., Liu, F., Yao, J., Yu, H.: Statistics based hybrid approach to chinese base phrase identification. In: Proceedings of Second Chinese Language Processing Workshop (2000)

    Google Scholar 

  8. Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)

    Article  Google Scholar 

  9. Chen, W., Zhang, Y., Isahara, H.: An empirical study of chinese chunking. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, Association for Computational Linguistics, pp. 97–104 (2006)

    Google Scholar 

  10. Steedman, M., Hwa, R., Clark, S., Osborne, M., Sarkar, A., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Example selection for bootstrapping statistical parsers. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 157–164 (2003)

    Google Scholar 

  11. Pham, T., Ng, H., Lee, W.: Word sense disambiguation with semi-supervised learning. In: AAAI 2005, The Twentieth National Conference on Artificial Intelligence (2005)

    Google Scholar 

  12. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL) (1995)

    Google Scholar 

  13. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)

    Google Scholar 

  14. Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2005)

    Google Scholar 

  15. Steedman, M., Osborne, M., Sarkar, A., Clark, S., Hwa, R., Hockenmaier, J., Ruhlen, P., Baker, S., Crim, J.: Bootstrapping statistical parsers from small datasets. In: The Proceedings of the Annual Meeting of the European Chapter of the ACL, pp. 331–338 (2003)

    Google Scholar 

  16. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100 (1998)

    Google Scholar 

  17. Sang, E.F.T.K.: Memory-based shallow parsing. JMLR 2, 559–594 (2002)

    Article  MATH  Google Scholar 

  18. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003 (2003)

    Google Scholar 

  19. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, W., Zhang, Y., Isahara, H. (2006). Chinese Chunking with Tri-training Learning. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_49

Download citation

  • DOI: https://doi.org/10.1007/11940098_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics