Abstract
We propose a structured learning approach, max-margin structure (MMS), which is targeted at natural language processing (NLP) tasks. The architecture of our approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on two NLP tasks: part-of-speech tagging and statistical machine translation (SMT). We present a perceptron-based online learning algorithm to train the model and demonstrate desirable computational scaling behavior over traditional optimisation methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007). Prague, Czech Republic
Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–72
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Brants T (2000) TnT—a statistical part-of-speech tagger. In: Proceedings of the 6th applied natural language processing conference (ANLPS 2000). Seattle, WA, pp 224–231
Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In: Proceedings of the ACM thirteenth conference on information and knowledge management (CIKM 2004). Hyatt Arlington Hotel, Washington, DC
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 136–158
Carpuat M, Wu D (2007) Context-dependent phrasal translation lexicons for statistical machine translation. In: Proceedings of MT Summit XI. Copenhagen, Denmark
Collins M (2002) Discriminative training methods for hidden markov models: theory and experiments with percepton algorithms. In: Sammut C, Hoffmann AG (eds) Proceedings of the 19th international conference on machine learning (ICML 2002)
Gentile C (2001) A new approximate maximal margin classification algorithm. J Mach Learn Res 2: 213–242
Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, pp 159–166
Greenough JB, Kittredge GL (1914) Words and their ways in English speech. The Macmillan Company, New York
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York
Hofmann T, Cai L, Ciaramita M (2003) Learning with taxonomies: classifying documents and words. In: NIPS workshop on syntax, semantics, and statistics
Joachims T (1999) Making large-scale SVM learning pratical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8): 2165–2176
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA 2004), pp 115–124
Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation (IWSLT 2005). Pittsburgh, PA
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001). Morgan Kaufmann Publishers Inc, San Francisco
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003). Japan
Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank project. In: Technical report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania
Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the twenty-fourth international conference (ICML 2007). Corvalis, OR
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Thrun S, Saul LK, Schölkopf B (eds) Proceedings of 7th annual conference on neural information processing systems (NIPS 2003). Vancouver, Canada
Taskar B, Lacoste-Julien S, Jordan MI (2006) Structured prediction, dual extragradient and bregman projections. J Mach Learn Res Spl Topic Mach Learn Optim, pp 1627–1653
Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the human language technology conference and meeting of the North American chapter of the association for computational linguistics (HLT-ACL 2003), pp 252–259
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Greiner R, Schuurmans D (eds) Proceedings of the 21st international machine learning conference (ICML 2004). ACM Press
Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: Proceedings of the human language technology conference and conference on empirical methods in natural language processing (HLT-EMNLP 2005), pp 771–778
Vinay JP, Darbelnet J (1995) Comparative stylistics of French and English: a methodology for translation. John Benjamins Publishing Company, Amsterdam
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ni, Y., Saunders, C., Szedmak, S. et al. The application of structured learning in natural language processing. Machine Translation 24, 71–85 (2010). https://doi.org/10.1007/s10590-010-9078-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-010-9078-1