Skip to main content

Labelwise Margin Maximization for Sequence Labeling

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

  • 2210 Accesses

Abstract

In sequence labeling problems, the objective functions of most learning algorithms are usually inconsistent with evaluation measures, such as Hamming loss. In this paper, we propose an online learning algorithm that addresses the problem of labelwise margin maximization for sequence labeling. We decompose the sequence margin to per-label margins and maximize these per-label margins individually, which can result to minimize the Hamming loss of sequence. We compare our algorithm with three state-of-art methods on three tasks, and the experimental results show our algorithm outperforms the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (2002)

    Google Scholar 

  2. Crammer, K., McDonald, R., Pereira, F.: Scalable large-margin online learning for structured classification. In: NIPS Workshop on Learning With Structured Outputs, Citeseer (2005)

    Google Scholar 

  3. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 3, 951–991 (2003)

    MATH  Google Scholar 

  4. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2001)

    MATH  Google Scholar 

  6. Emerson, T.: The second international chinese word segmentation bakeoff. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, pp. 123–133 (2005)

    Google Scholar 

  7. Golub, G., Van Loan, C.: Matrix computations. Johns Hopkins Univ. Pr., Baltimore (1996)

    MATH  Google Scholar 

  8. Gross, S., Russakovsky, O., Do, C., Batzoglou, S.: Training conditional random fields for maximum labelwise accuracy. Advances in Neural Information Processing Systems 19, 529 (2007)

    Google Scholar 

  9. Kakade, S., Teh, Y., Roweis, S.: An alternate objective function for markovian fields. In: Proceedings of International Conference on Machine Learning, vol. 19, pp. 275–282 (2002)

    Google Scholar 

  10. Kazama, J., Torisawa, K.: A new perceptron algorithm for sequence labeling with non-local features. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL (2007)

    Google Scholar 

  11. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning (2001)

    Google Scholar 

  12. Levow, G.: The third international chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, pp. 108–117 (2006)

    Google Scholar 

  13. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Citsseer, pp. 591–598 (2000)

    Google Scholar 

  14. McCallum, A., Sutton, C.: Piecewise training with parameter independence diagrams: Comparing globally-and locally-trained linear-chain crfs. In: NIPS 2004 Workshop on Learning with Structured Outputs (2004)

    Google Scholar 

  15. Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)

    Google Scholar 

  16. Ramshaw, L., Marcus, M.: Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In: Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language, pp. 128–135 (1994)

    Google Scholar 

  17. Sang, E., Veenstra, J.: Representing text chunks. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp. 173–179. Association for Computational Linguistics (1999)

    Google Scholar 

  18. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, NLPBA (2004)

    Google Scholar 

  19. Sutton, C., McCallum, A.: Piecewise training of undirected models. In: 21st Conference on Uncertainty in Artificial Intelligence. Citeseer (2005)

    Google Scholar 

  20. Sutton, C., McCallum, A.: Piecewise pseudolikelihood for efficient training of conditional random fields. In: Proceedings of the 24th International Conference on Machine Learning, p. 870. ACM, New York (2007)

    Google Scholar 

  21. Sutton, C., McCallum, A.: Piecewise training for structured prediction. Machine learning 77(2), 165–194 (2009)

    Article  Google Scholar 

  22. Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Proceedings of Neural Information Processing Systems (2003)

    Google Scholar 

  23. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning, ICML (2004)

    Google Scholar 

  24. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  25. Xiong, Y., Zhu, J., Huang, H., Xu, H.: Minimum tag error for discriminative training of conditional random fields. Information Sciences 179(1-2), 169–179 (2009)

    Article  MATH  Google Scholar 

  26. Xue, N.: Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing 8(1), 29–48 (2003)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, W., Qiu, X., Huang, X. (2011). Labelwise Margin Maximization for Sequence Labeling. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19400-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19399-6

  • Online ISBN: 978-3-642-19400-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics