Abstract
Instead of using a common PCFG to parse all texts, we present an efficient generative probabilistic model for the probabilistic context-free grammars(PCFGs) based on the Bayesian finite mixture model, where we assume that there are several PCFGs and each of these PCFGs share the same CFG but with different rule probabilities. Sentences of the same article in the corpus are generated from a common multinomial distribution over these PCFGs. We derive a Markov chain Monte Carlo algorithm for this model. In the experiments, our multi-grammar model outperforms both single grammar model and Inside-Outside algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kehler, A., Stolcke, A.: Preface. In: Kehler, A., Stolcke, A. (eds.) Proceedings of the Workshop Unsupervised Learning in Natural Language Processing. Association for Computational Linguistics (1999)
Goldwater, S., Griffiths, T.L.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proc. of ACL (2007)
Toutanova, K., Johnson, M.: A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In: Proc. of NIPS (2007)
Eisner, J.: Transformational priors over grammars. In: Proc. of EMNLP (2002)
Liang, P., Petrov, S., Jordan, M., Klein, D.: The infinite PCFG using hierarchical Dirichlet processes. In: Proc. of EMNLP (2007)
Finkel, J.R., Manning, C.D., Ng, A.Y.: Solving the problem of cascading errors: Approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 618–626. Association for Computational Linguistics (2006)
Kenichi, K., Sato, T.: An application of the variational Bayesian approach to probabilistic context-free grammars. In: International Joint Conference on Natural Language Processing Workshop Beyond Shallow Analyses (2004)
Johnson, M., Griffiths, T.L., Goldwater, S.: Bayesian inference for PCFGs via Markov chain Monte Carlo. In: Proc. of NAACL (2007)
Iwata, T., Mochihashi, D., Sawada, H.: Learning common grammar from multilingual corpus. In: Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics (2010)
Blei, D.M., Andrew Y.N., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Johnson, M.: PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, 101:52285235 (2004)
Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Computer, Speech and Language, 4:3556 (1990)
Goodman, J.T.: Parsing Inside-Out. Ph.D. thesis, Harvard University Cambridge, Massachusetts (1998)
Sun, L., Mielens, J., Baldridge, J.: Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations. To appear in Proceedings of EMNLP 2014 (2014)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, P.L.H., Tang, Y. (2015). Bayesian Finite Mixture Models for Probabilistic Context-Free Grammars. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-18111-0_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)