Bayesian Finite Mixture Models for Probabilistic Context-Free Grammars

Yu, Philip L. H.; Tang, Yaohua

doi:10.1007/978-3-319-18111-0_16

Philip L. H. Yu¹⁴ &
Yaohua Tang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2939 Accesses

Abstract

Instead of using a common PCFG to parse all texts, we present an efficient generative probabilistic model for the probabilistic context-free grammars(PCFGs) based on the Bayesian finite mixture model, where we assume that there are several PCFGs and each of these PCFGs share the same CFG but with different rule probabilities. Sentences of the same article in the corpus are generated from a common multinomial distribution over these PCFGs. We derive a Markov chain Monte Carlo algorithm for this model. In the experiments, our multi-grammar model outperforms both single grammar model and Inside-Outside algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kehler, A., Stolcke, A.: Preface. In: Kehler, A., Stolcke, A. (eds.) Proceedings of the Workshop Unsupervised Learning in Natural Language Processing. Association for Computational Linguistics (1999)
Google Scholar
Goldwater, S., Griffiths, T.L.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proc. of ACL (2007)
Google Scholar
Toutanova, K., Johnson, M.: A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In: Proc. of NIPS (2007)
Google Scholar
Eisner, J.: Transformational priors over grammars. In: Proc. of EMNLP (2002)
Google Scholar
Liang, P., Petrov, S., Jordan, M., Klein, D.: The infinite PCFG using hierarchical Dirichlet processes. In: Proc. of EMNLP (2007)
Google Scholar
Finkel, J.R., Manning, C.D., Ng, A.Y.: Solving the problem of cascading errors: Approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 618–626. Association for Computational Linguistics (2006)
Google Scholar
Kenichi, K., Sato, T.: An application of the variational Bayesian approach to probabilistic context-free grammars. In: International Joint Conference on Natural Language Processing Workshop Beyond Shallow Analyses (2004)
Google Scholar
Johnson, M., Griffiths, T.L., Goldwater, S.: Bayesian inference for PCFGs via Markov chain Monte Carlo. In: Proc. of NAACL (2007)
Google Scholar
Iwata, T., Mochihashi, D., Sawada, H.: Learning common grammar from multilingual corpus. In: Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics (2010)
Google Scholar
Blei, D.M., Andrew Y.N., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Google Scholar
Johnson, M.: PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, 101:52285235 (2004)
Google Scholar
Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Computer, Speech and Language, 4:3556 (1990)
Google Scholar
Goodman, J.T.: Parsing Inside-Out. Ph.D. thesis, Harvard University Cambridge, Massachusetts (1998)
Google Scholar
Sun, L., Mielens, J., Baldridge, J.: Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations. To appear in Proceedings of EMNLP 2014 (2014)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics & Actuarial Science, The University of Hong Kong, Pok Fu Lam, Hong Kong, China
Philip L. H. Yu & Yaohua Tang

Authors

Philip L. H. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yaohua Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip L. H. Yu .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, P.L.H., Tang, Y. (2015). Bayesian Finite Mixture Models for Probabilistic Context-Free Grammars. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-18111-0_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics