poster

Label correspondence learning for part-of-speech annotation transformation

Authors:
Muhua Zhu

Northeastern University, Shenyang, China

Northeastern University, Shenyang, China
View Profile

,
Huizhen Wang

Northeastern University, Shenyang, China

Northeastern University, Shenyang, China
View Profile

,
Jingbo Zhu

Northeastern University, Shenyang, China

Northeastern University, Shenyang, China
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 1461–1464https://doi.org/10.1145/1645953.1646145

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 1461–1464

ABSTRACT

The performance of machine learning methods heavily depends on the volume of used training data. For the purpose of dataset enlargement, it is of interest to study the problem of unifying multiple labeled datasets with different annotation standards. In this paper, we focus on the case of unifying datasets for sequence labeling problems with natural language part-of-speech (POS) tagging as an examplar application. To this end, we propose a probabilistic approach to transforming the annotations of one dataset to the standard specified by another dataset. The key component of the approach, named as label correspondence learning, serves as a bridge of annotations from the datasets. Two methods designed from distinct perspectives are proposed to attack this sub-problem. Experiments on two large-scale part-of-speech datasets demonstrate the efficacy of the transformation and label correspondence learning methods.

References

D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (Ed.2). Prentice Hall Science in Artificial Intelligence, 2009. Google ScholarDigital Library
M. Banko and E. Brill. Scaling to very very large corpora for natural language. In Proceeding of ACL, pages 26--33, 2001. Google ScholarDigital Library
J. K. Low, H. T. Ng, and W. Guo. A maximum entropy approach to chinese word segmentation. In Proceedings of fifth SIGHAN workshop, pages 161--164, 2005.Google Scholar
A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceeding of Association of Computational Linguistics, pages 133--132, 1996.Google Scholar
M. Collins. Head-driven statistical models for natural language parsing. Ph.D. Thesis. Penn University, 1999. Google ScholarDigital Library
S. M. Thede and M. P. Harper. A second-order hidden markov models for part-of-speech. In Proceedings of ACL., pages 175--182, 1999. Google ScholarDigital Library
N. Xue, F. dong Chiou, and M. Palmer. Building a large-scale annotated chinese corpus. In Proceeding of COLING., pages 1--8, 2002. Google ScholarDigital Library
Z. qiang Huang. M. P. Harper, and W. Wang. Mandarin part-of-speech tagging and discriminative. In Proceeding of EMNLP-CoNLL., pages 1093--1102, 2007.Google Scholar
Q. Zhou.Phrase bracketing and annotating on chinese language corpus. (in chinese). Ph.D. Thesis, Beijing University., 1996.Google Scholar
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence. In Proceedings of ICML., pages 282--289, 2001. Google ScholarDigital Library
J. Nivre. Inductive dependency parsing. In Springer., 34.Google Scholar
R .Johansson and P. Nugues. Extended constituent-to-dependency conversion for english. In Proceeding of EMNLP-CoNLL., pages 105--112, 2007.Google Scholar
S. Ekeklint and J. Nivre.A dependency-based conversion of propbank. In Proceeding of FRAME., pages 19--25, 2007.Google Scholar
P. Kingsbury, M. Palmer, and M. Marcus. Adding semantic annotation to the penn treebank. In Proceeding of HLT., 2002.Google Scholar
M. Johnson. PCFG models of linguistic tree representations. Computational Linguistics., 24. Google ScholarDigital Library
W. Jiang, L. Huang, and Q. Liu. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study. In Proceedings of ACL., pages 522--530, 2009. Google ScholarDigital Library

Index Terms

Label correspondence learning for part-of-speech annotation transformation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new ...
Read More
Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging
ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City

Current deep learning based cross-lingual Part-of-Speech (POS) tagging methods are limited by their ability to achieve fast learning and generalization when the data in the target language is scarce. In this paper, we integrate a meta-learning procedure ...
Read More
Korean Part-of-speech Tagging Based on Morpheme Generation

Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
annotation transformation
natural language processing
part-of-speech tagging
sequence labeling
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Label correspondence learning for part-of-speech annotation transformation

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging

Korean Part-of-speech Tagging Based on Morpheme Generation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Label correspondence learning for part-of-speech annotation transformation

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging

Korean Part-of-speech Tagging Based on Morpheme Generation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media