Skip to main content
Log in

The application of structured learning in natural language processing

  • Published:
Machine Translation

Abstract

We propose a structured learning approach, max-margin structure (MMS), which is targeted at natural language processing (NLP) tasks. The architecture of our approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on two NLP tasks: part-of-speech tagging and statistical machine translation (SMT). We present a perceptron-based online learning algorithm to train the model and demonstrate desirable computational scaling behavior over traditional optimisation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007). Prague, Czech Republic

  • Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–72

    Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  • Brants T (2000) TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th applied natural language processing conference (ANLPS 2000). Seattle, WA, pp 224–231

  • Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In: Proceedings of the ACM thirteenth conference on information and knowledge management (CIKM 2004). Hyatt Arlington Hotel, Washington, DC

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 136–158

  • Carpuat M, Wu D (2007) Context-dependent phrasal translation lexicons for statistical machine translation. In: Proceedings of MT Summit XI. Copenhagen, Denmark

  • Collins M (2002) Discriminative training methods for hidden markov models: theory and experiments with percepton algorithms. In: Sammut C, Hoffmann AG (eds) Proceedings of the 19th international conference on machine learning (ICML 2002)

  • Gentile C (2001) A new approximate maximal margin classification algorithm. J Mach Learn Res 2: 213–242

    Article  MathSciNet  Google Scholar 

  • Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, pp 159–166

  • Greenough JB, Kittredge GL (1914) Words and their ways in English speech. The Macmillan Company, New York

    Google Scholar 

  • Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Hofmann T, Cai L, Ciaramita M (2003) Learning with taxonomies: classifying documents and words. In: NIPS workshop on syntax, semantics, and statistics

  • Joachims T (1999) Making large-scale SVM learning pratical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge

    Google Scholar 

  • Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8): 2165–2176

    Article  MathSciNet  Google Scholar 

  • Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA 2004), pp 115–124

  • Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation (IWSLT 2005). Pittsburgh, PA

  • Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001). Morgan Kaufmann Publishers Inc, San Francisco

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003). Japan

  • Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank project. In: Technical report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania

  • Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471

    Article  MATH  Google Scholar 

  • Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the twenty-fourth international conference (ICML 2007). Corvalis, OR

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Google Scholar 

  • Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Thrun S, Saul LK, Schölkopf B (eds) Proceedings of 7th annual conference on neural information processing systems (NIPS 2003). Vancouver, Canada

  • Taskar B, Lacoste-Julien S, Jordan MI (2006) Structured prediction, dual extragradient and bregman projections. J Mach Learn Res Spl Topic Mach Learn Optim, pp 1627–1653

  • Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the human language technology conference and meeting of the North American chapter of the association for computational linguistics (HLT-ACL 2003), pp 252–259

  • Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Greiner R, Schuurmans D (eds) Proceedings of the 21st international machine learning conference (ICML 2004). ACM Press

  • Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: Proceedings of the human language technology conference and conference on empirical methods in natural language processing (HLT-EMNLP 2005), pp 771–778

  • Vinay JP, Darbelnet J (1995) Comparative stylistics of French and English: a methodology for translation. John Benjamins Publishing Company, Amsterdam

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yizhao Ni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Y., Saunders, C., Szedmak, S. et al. The application of structured learning in natural language processing. Machine Translation 24, 71–85 (2010). https://doi.org/10.1007/s10590-010-9078-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-010-9078-1

Keywords

Navigation