The application of structured learning in natural language processing

Ni, Yizhao; Saunders, Craig; Szedmak, Sandor; Niranjan, Mahesan

doi:10.1007/s10590-010-9078-1

The application of structured learning in natural language processing

Published: 21 May 2010

Volume 24, pages 71–85, (2010)
Cite this article

Machine Translation

Yizhao Ni¹,
Craig Saunders²,
Sandor Szedmak¹ &
…
Mahesan Niranjan¹

206 Accesses
1 Citation
Explore all metrics

Abstract

We propose a structured learning approach, max-margin structure (MMS), which is targeted at natural language processing (NLP) tasks. The architecture of our approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on two NLP tasks: part-of-speech tagging and statistical machine translation (SMT). We present a perceptron-based online learning algorithm to train the model and demonstrate desirable computational scaling behavior over traditional optimisation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007). Prague, Czech Republic
Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–72
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Brants T (2000) TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th applied natural language processing conference (ANLPS 2000). Seattle, WA, pp 224–231
Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In: Proceedings of the ACM thirteenth conference on information and knowledge management (CIKM 2004). Hyatt Arlington Hotel, Washington, DC
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 136–158
Carpuat M, Wu D (2007) Context-dependent phrasal translation lexicons for statistical machine translation. In: Proceedings of MT Summit XI. Copenhagen, Denmark
Collins M (2002) Discriminative training methods for hidden markov models: theory and experiments with percepton algorithms. In: Sammut C, Hoffmann AG (eds) Proceedings of the 19th international conference on machine learning (ICML 2002)
Gentile C (2001) A new approximate maximal margin classification algorithm. J Mach Learn Res 2: 213–242
Article MathSciNet Google Scholar
Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, pp 159–166
Greenough JB, Kittredge GL (1914) Words and their ways in English speech. The Macmillan Company, New York
Google Scholar
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York
Book MATH Google Scholar
Hofmann T, Cai L, Ciaramita M (2003) Learning with taxonomies: classifying documents and words. In: NIPS workshop on syntax, semantics, and statistics
Joachims T (1999) Making large-scale SVM learning pratical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge
Google Scholar
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8): 2165–2176
Article MathSciNet Google Scholar
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA 2004), pp 115–124
Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation (IWSLT 2005). Pittsburgh, PA
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001). Morgan Kaufmann Publishers Inc, San Francisco
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003). Japan
Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank project. In: Technical report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania
Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471
Article MATH Google Scholar
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the twenty-fourth international conference (ICML 2007). Corvalis, OR
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Google Scholar
Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Thrun S, Saul LK, Schölkopf B (eds) Proceedings of 7th annual conference on neural information processing systems (NIPS 2003). Vancouver, Canada
Taskar B, Lacoste-Julien S, Jordan MI (2006) Structured prediction, dual extragradient and bregman projections. J Mach Learn Res Spl Topic Mach Learn Optim, pp 1627–1653
Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the human language technology conference and meeting of the North American chapter of the association for computational linguistics (HLT-ACL 2003), pp 252–259
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Greiner R, Schuurmans D (eds) Proceedings of the 21st international machine learning conference (ICML 2004). ACM Press
Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: Proceedings of the human language technology conference and conference on empirical methods in natural language processing (HLT-EMNLP 2005), pp 771–778
Vinay JP, Darbelnet J (1995) Comparative stylistics of French and English: a methodology for translation. John Benjamins Publishing Company, Amsterdam
Google Scholar

Download references

Author information

Authors and Affiliations

ISIS Group, School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
Yizhao Ni, Sandor Szedmak & Mahesan Niranjan
Xerox Research Centre Europe, 6 Chemin de Maupertuis, 38240, Meylan, France
Craig Saunders

Authors

Yizhao Ni
View author publications
You can also search for this author inPubMed Google Scholar
Craig Saunders
View author publications
You can also search for this author inPubMed Google Scholar
Sandor Szedmak
View author publications
You can also search for this author inPubMed Google Scholar
Mahesan Niranjan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yizhao Ni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Y., Saunders, C., Szedmak, S. et al. The application of structured learning in natural language processing. Machine Translation 24, 71–85 (2010). https://doi.org/10.1007/s10590-010-9078-1

Download citation

Received: 30 October 2009
Accepted: 22 April 2010
Published: 21 May 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10590-010-9078-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The application of structured learning in natural language processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Natural Language Processing, Moving from Rules to Data

Machine Learning and Natural Language Processing: Review of Models and Optimization Problems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

The application of structured learning in natural language processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Natural Language Processing, Moving from Rules to Data

Machine Learning and Natural Language Processing: Review of Models and Optimization Problems

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now