Skip to main content

Bayesian Finite Mixture Models for Probabilistic Context-Free Grammars

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

  • 2939 Accesses

Abstract

Instead of using a common PCFG to parse all texts, we present an efficient generative probabilistic model for the probabilistic context-free grammars(PCFGs) based on the Bayesian finite mixture model, where we assume that there are several PCFGs and each of these PCFGs share the same CFG but with different rule probabilities. Sentences of the same article in the corpus are generated from a common multinomial distribution over these PCFGs. We derive a Markov chain Monte Carlo algorithm for this model. In the experiments, our multi-grammar model outperforms both single grammar model and Inside-Outside algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kehler, A., Stolcke, A.: Preface. In: Kehler, A., Stolcke, A. (eds.) Proceedings of the Workshop Unsupervised Learning in Natural Language Processing. Association for Computational Linguistics (1999)

    Google Scholar 

  2. Goldwater, S., Griffiths, T.L.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proc. of ACL (2007)

    Google Scholar 

  3. Toutanova, K., Johnson, M.: A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In: Proc. of NIPS (2007)

    Google Scholar 

  4. Eisner, J.: Transformational priors over grammars. In: Proc. of EMNLP (2002)

    Google Scholar 

  5. Liang, P., Petrov, S., Jordan, M., Klein, D.: The infinite PCFG using hierarchical Dirichlet processes. In: Proc. of EMNLP (2007)

    Google Scholar 

  6. Finkel, J.R., Manning, C.D., Ng, A.Y.: Solving the problem of cascading errors: Approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 618–626. Association for Computational Linguistics (2006)

    Google Scholar 

  7. Kenichi, K., Sato, T.: An application of the variational Bayesian approach to probabilistic context-free grammars. In: International Joint Conference on Natural Language Processing Workshop Beyond Shallow Analyses (2004)

    Google Scholar 

  8. Johnson, M., Griffiths, T.L., Goldwater, S.: Bayesian inference for PCFGs via Markov chain Monte Carlo. In: Proc. of NAACL (2007)

    Google Scholar 

  9. Iwata, T., Mochihashi, D., Sawada, H.: Learning common grammar from multilingual corpus. In: Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics (2010)

    Google Scholar 

  10. Blei, D.M., Andrew Y.N., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    Google Scholar 

  11. Johnson, M.: PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, 101:52285235 (2004)

    Google Scholar 

  13. Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Computer, Speech and Language, 4:3556 (1990)

    Google Scholar 

  14. Goodman, J.T.: Parsing Inside-Out. Ph.D. thesis, Harvard University Cambridge, Massachusetts (1998)

    Google Scholar 

  15. Sun, L., Mielens, J., Baldridge, J.: Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations. To appear in Proceedings of EMNLP 2014 (2014)

    Google Scholar 

  16. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip L. H. Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, P.L.H., Tang, Y. (2015). Bayesian Finite Mixture Models for Probabilistic Context-Free Grammars. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18111-0_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18110-3

  • Online ISBN: 978-3-319-18111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics