Iterative Rule Segmentation under Minimum Description Length for Unsupervised Transduction Grammar Induction

Saers, Markus; Addanki, Karteek; Wu, Dekai

doi:10.1007/978-3-642-39593-2_20

Markus Saers²²,
Karteek Addanki²² &
Dekai Wu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

2663 Accesses
2 Citations

Abstract

We argue that for purely incremental unsupervised learning of phrasal inversion transduction grammars, a minimum description length driven, iterative top-down rule segmentation approach that is the polar opposite of Saers, Addanki, and Wu’s previous 2012 bottom-up iterative rule chunking model yields significantly better translation accuracy and grammar parsimony. We still aim for unsupervised bilingual grammar induction such that training and testing are optimized upon the same exact underlying model—a basic principle of machine learning and statistical prediction that has become unduly ignored in statistical machine translation models of late, where most decoders are badly mismatched to the training assumptions. Our novel approach learns phrasal translations by recursively subsegmenting the training corpus, as opposed to our previous model—where we start with a token-based transduction grammar and iteratively build larger chunks. Moreover, the rule segmentation decisions in our approach are driven by a minimum description length objective, whereas the rule chunking decisions were driven by a maximum likelihood objective. We demonstrate empirically how this trades off maximum likelihood against model size, aiming for a more parsimonious grammar that escapes the perfect overfitting to the training data that we start out with, and gradually generalizes to previously unseen sentence translations so long as the model shrinks enough to warrant a looser fit to the training data. Experimental results show that our approach produces a significantly smaller and better model than the chunking-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–403 (1997)
Google Scholar
Saers, M., Addanki, K., Wu, D.: From finite-state to inversion transductions: Toward unsupervised bilingual grammar induction. In: COLING 2012: Technical Papers, Mumbai, India, pp. 2325–2340 (December 2012)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: HLT/NAACL 2003, Edmonton, Canada, vol. 1 (May/June 2003)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: ACL 2005, Ann Arbor, Michigan, pp. 263–270 (June 2005)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The Mathematics of Machine Translation: Parameter estimation. CL 19(2) (1993)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based Word Alignment in Statistical Translation. In: COLING 1996, vol. 2, pp. 836–841 (1996)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Johnson, H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: EMNLP/CoNLL 2007, Prague, Czech Republic, pp. 967–975 (June 2007)
Google Scholar
Cherry, C., Lin, D.: Inversion transduction grammar for joint phrasal translation modeling. In: SSST-1, Rochester, New York, pp. 17–24 (April 2007)
Google Scholar
Zhang, H., Quirk, C., Moore, R.C., Gildea, D.: Bayesian learning of non-compositional phrases with synchronous parsing. In: ACL/HLT 2008, Columbus, Ohio, pp. 97–105 (June 2008)
Google Scholar
Blunsom, P., Cohn, T., Dyer, C., Osborne, M.: A Gibbs sampler for phrasal synchronous grammar induction. In: ACL/IJCNLP 2009, Singapore (August 2009)
Google Scholar
Haghighi, A., Blitzer, J., DeNero, J., Klein, D.: Better word alignments with supervised itg models. In: ACL/IJCNLP’09, Suntec, Singapore (August 2009)
Google Scholar
Saers, M., Wu, D.: Improving phrase-based translation via word alignments from stochastic inversion transduction grammars. In: SSST-3, Boulder, CO (June 2009)
Google Scholar
Saers, M., Wu, D.: Principled induction of phrasal bilexica. In: EAMT 2011, Leuven, Belgium, pp. 313–320 (May 2011)
Google Scholar
Blunsom, P., Cohn, T.: Inducing synchronous grammars with slice sampling. In: HLT/NAACL 2010, Los Angeles, California, pp. 238–241 (June 2010)
Google Scholar
Burkett, D., Blitzer, J., Klein, D.: Joint parsing and alignment with weakly synchronized grammars. In: HLT/NAACL 2010, Los Angeles, CA (June 2010)
Google Scholar
Riesa, J., Marcu, D.: Hierarchical search for word alignment. In: ACL 2010, Uppsala, Sweden, pp. 157–166 (July 2010)
Google Scholar
Saers, M., Nivre, J., Wu, D.: Word alignment with stochastic bracketing linear inversion transduction grammar. In: HLT/NAACL 2010, Los Angeles, California, pp. 341–344 (June 2010)
Google Scholar
Neubig, G., Watanabe, T., Sumita, E., Mori, S., Kawahara, T.: An unsupervised model for joint phrase alignment and extraction. In: ACL/HLT 2011, Portland, Oregon (June 2011)
Google Scholar
Neubig, G., Watanabe, T., Mori, S., Kawahara, T.: Machine translation without words through substring alignment. In: ACL 2012, Jeju, Korea (July 2012)
Google Scholar
Galley, M., Graehl, J., Knight, K., Marcu, D., DeNeefe, S., Wang, W., Thayer, I.: Scalable inference and training of context-rich syntactic translation models. In: COLING/ACL 2006, Sydney, Australia (July 2006)
Google Scholar
Stolcke, A., Omohundro, S.: Inducing probabilistic grammars by bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 106–118. Springer, Heidelberg (1994)
Chapter Google Scholar
Grünwald, P.: A minimum description length approach to grammar inference in symbolic. In: Wermter, S., Scheler, G., Riloff, E. (eds.) IJCAI-WS 1995. LNCS (LNAI), vol. 1040, pp. 203–216. Springer, Heidelberg (1996)
Chapter Google Scholar
Si, Z., Pei, M., Yao, B., Zhu, S.-C.: Unsupervised learning of event and-or grammar and semantics from video. In: IEEE ICCV 2011 (November 2011)
Google Scholar
Solomonoff, R.J.: A new method for discovering the grammars of phrase structure languages. In: IFIP Congress, pp. 285–289 (1959)
Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11(2), 416–431 (1983)
Article MathSciNet MATH Google Scholar
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Google Scholar
Saers, M., Nivre, J., Wu, D.: Learning stochastic bracketing inversion transduction grammars with a cubic time biparsing algorithm. In: IWPT 2009, Paris, France, pp. 29–32 (October 2009)
Google Scholar
Fordyce, C.S.: Overview of the IWSLT 2007 evaluation campaign. In: IWSLT 2007 (2007)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Cocke, J.: Programming languages and their compilers: Preliminary notes. Courant Institute of Mathematical Sciences, New York University (1969)
Google Scholar
Kasami, T.: An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-00143, Air Force Cambridge Research Laboratory (1965)
Google Scholar
Younger, D.H.: Recognition and parsing of context-free languages in time n ³. Information and Control 10(2), 189–208 (1967)
Article MATH Google Scholar
Chiang, D.: Hierarchical phrase-based translation. CL 33(2) (2007)
Google Scholar
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: ICSLP 2002, Denver, Colorado, pp. 901–904 (September 2002)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL 2002, Philadelphia, PA (July 2002)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: HLT 2002, San Diego, California, pp. 138–145 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Human Language Technology Center Dept. of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong
Markus Saers, Karteek Addanki & Dekai Wu

Authors

Markus Saers
View author publications
You can also search for this author in PubMed Google Scholar
Karteek Addanki
View author publications
You can also search for this author in PubMed Google Scholar
Dekai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
Research Institute for Information and Language Processing, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov
Fakultät für Informatik, Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saers, M., Addanki, K., Wu, D. (2013). Iterative Rule Segmentation under Minimum Description Length for Unsupervised Transduction Grammar Induction. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-39593-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics