Labelwise Margin Maximization for Sequence Labeling

Gao, Wenjun; Qiu, Xipeng; Huang, Xuanjing

doi:10.1007/978-3-642-19400-9_10

Wenjun Gao¹⁷,
Xipeng Qiu¹⁷ &
Xuanjing Huang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2210 Accesses

Abstract

In sequence labeling problems, the objective functions of most learning algorithms are usually inconsistent with evaluation measures, such as Hamming loss. In this paper, we propose an online learning algorithm that addresses the problem of labelwise margin maximization for sequence labeling. We decompose the sequence margin to per-label margins and maximize these per-label margins individually, which can result to minimize the Hamming loss of sequence. We compare our algorithm with three state-of-art methods on three tasks, and the experimental results show our algorithm outperforms the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (2002)
Google Scholar
Crammer, K., McDonald, R., Pereira, F.: Scalable large-margin online learning for structured classification. In: NIPS Workshop on Learning With Structured Outputs, Citeseer (2005)
Google Scholar
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 3, 951–991 (2003)
MATH Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2001)
MATH Google Scholar
Emerson, T.: The second international chinese word segmentation bakeoff. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, pp. 123–133 (2005)
Google Scholar
Golub, G., Van Loan, C.: Matrix computations. Johns Hopkins Univ. Pr., Baltimore (1996)
MATH Google Scholar
Gross, S., Russakovsky, O., Do, C., Batzoglou, S.: Training conditional random fields for maximum labelwise accuracy. Advances in Neural Information Processing Systems 19, 529 (2007)
Google Scholar
Kakade, S., Teh, Y., Roweis, S.: An alternate objective function for markovian fields. In: Proceedings of International Conference on Machine Learning, vol. 19, pp. 275–282 (2002)
Google Scholar
Kazama, J., Torisawa, K.: A new perceptron algorithm for sequence labeling with non-local features. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL (2007)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar
Levow, G.: The third international chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, pp. 108–117 (2006)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Citsseer, pp. 591–598 (2000)
Google Scholar
McCallum, A., Sutton, C.: Piecewise training with parameter independence diagrams: Comparing globally-and locally-trained linear-chain crfs. In: NIPS 2004 Workshop on Learning with Structured Outputs (2004)
Google Scholar
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Google Scholar
Ramshaw, L., Marcus, M.: Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In: Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language, pp. 128–135 (1994)
Google Scholar
Sang, E., Veenstra, J.: Representing text chunks. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp. 173–179. Association for Computational Linguistics (1999)
Google Scholar
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, NLPBA (2004)
Google Scholar
Sutton, C., McCallum, A.: Piecewise training of undirected models. In: 21st Conference on Uncertainty in Artificial Intelligence. Citeseer (2005)
Google Scholar
Sutton, C., McCallum, A.: Piecewise pseudolikelihood for efficient training of conditional random fields. In: Proceedings of the 24th International Conference on Machine Learning, p. 870. ACM, New York (2007)
Google Scholar
Sutton, C., McCallum, A.: Piecewise training for structured prediction. Machine learning 77(2), 165–194 (2009)
Article Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Proceedings of Neural Information Processing Systems (2003)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning, ICML (2004)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Xiong, Y., Zhu, J., Huang, H., Xu, H.: Minimum tag error for discriminative training of conditional random fields. Information Sciences 179(1-2), 169–179 (2009)
Article MATH Google Scholar
Xue, N.: Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing 8(1), 29–48 (2003)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, China
Wenjun Gao, Xipeng Qiu & Xuanjing Huang

Authors

Wenjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander F. Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W., Qiu, X., Huang, X. (2011). Labelwise Margin Maximization for Sequence Labeling. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-19400-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics