skip to main content
10.1145/1645953.1646145acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Label correspondence learning for part-of-speech annotation transformation

Published:02 November 2009Publication History

ABSTRACT

The performance of machine learning methods heavily depends on the volume of used training data. For the purpose of dataset enlargement, it is of interest to study the problem of unifying multiple labeled datasets with different annotation standards. In this paper, we focus on the case of unifying datasets for sequence labeling problems with natural language part-of-speech (POS) tagging as an examplar application. To this end, we propose a probabilistic approach to transforming the annotations of one dataset to the standard specified by another dataset. The key component of the approach, named as label correspondence learning, serves as a bridge of annotations from the datasets. Two methods designed from distinct perspectives are proposed to attack this sub-problem. Experiments on two large-scale part-of-speech datasets demonstrate the efficacy of the transformation and label correspondence learning methods.

References

  1. D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (Ed.2). Prentice Hall Science in Artificial Intelligence, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Banko and E. Brill. Scaling to very very large corpora for natural language. In Proceeding of ACL, pages 26--33, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. K. Low, H. T. Ng, and W. Guo. A maximum entropy approach to chinese word segmentation. In Proceedings of fifth SIGHAN workshop, pages 161--164, 2005.Google ScholarGoogle Scholar
  4. A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceeding of Association of Computational Linguistics, pages 133--132, 1996.Google ScholarGoogle Scholar
  5. M. Collins. Head-driven statistical models for natural language parsing. Ph.D. Thesis. Penn University, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. M. Thede and M. P. Harper. A second-order hidden markov models for part-of-speech. In Proceedings of ACL., pages 175--182, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Xue, F. dong Chiou, and M. Palmer. Building a large-scale annotated chinese corpus. In Proceeding of COLING., pages 1--8, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. qiang Huang. M. P. Harper, and W. Wang. Mandarin part-of-speech tagging and discriminative. In Proceeding of EMNLP-CoNLL., pages 1093--1102, 2007.Google ScholarGoogle Scholar
  9. Q. Zhou.Phrase bracketing and annotating on chinese language corpus. (in chinese). Ph.D. Thesis, Beijing University., 1996.Google ScholarGoogle Scholar
  10. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence. In Proceedings of ICML., pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Nivre. Inductive dependency parsing. In Springer., 34.Google ScholarGoogle Scholar
  12. R .Johansson and P. Nugues. Extended constituent-to-dependency conversion for english. In Proceeding of EMNLP-CoNLL., pages 105--112, 2007.Google ScholarGoogle Scholar
  13. S. Ekeklint and J. Nivre.A dependency-based conversion of propbank. In Proceeding of FRAME., pages 19--25, 2007.Google ScholarGoogle Scholar
  14. P. Kingsbury, M. Palmer, and M. Marcus. Adding semantic annotation to the penn treebank. In Proceeding of HLT., 2002.Google ScholarGoogle Scholar
  15. M. Johnson. PCFG models of linguistic tree representations. Computational Linguistics., 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Jiang, L. Huang, and Q. Liu. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study. In Proceedings of ACL., pages 522--530, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Label correspondence learning for part-of-speech annotation transformation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader