Skip to main content

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

  • Conference paper
Grammatical Inference: Algorithms and Applications (ICGI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5278))

Included in the following conference series:

Abstract

This paper presents PCFG-BCL, an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus. Our analysis shows that this procedure uses a greedy approach to adding rules such that each set of rules that is added to the grammar results in the largest increase in the posterior of the grammar given the training corpus. Results of our experiments on several benchmark datasets show that PCFG-BCL is competitive with existing methods for unsupervised CFG learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adriaans, P., Trautwein, M., Vervoort, M.: Towards high speed grammar induction on large text corpora. In: Jeffery, K.G., Hlaváč, V., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. van Zaanen, M.: Abl: Alignment-based learning. In: COLING (2000)

    Google Scholar 

  3. Clark, A.: Unsupervised induction of stochastic context-free grammars using distributional clustering. In: Proceedings of CoNLL (2001)

    Google Scholar 

  4. Clark, A.: Learning deterministic context free grammars: The omphalos competition. Machine Learning 66 (2007)

    Google Scholar 

  5. Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. Proc. Natl. Acad. Sci. 102(33), 11629–11634 (2005)

    Article  Google Scholar 

  6. Chen, S.F.: Bayesian grammar induction for language modeling. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995)

    Google Scholar 

  7. Kurihara, K., Sato, T.: An application of the variational bayesian approach to probabilistic contextfree grammars. In: IJCNLP 2004 Workshop beyond shallow analyses (2004)

    Google Scholar 

  8. Kurihara, K., Sato, T.: Variational bayesian grammar induction for natural language. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 84–96. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Liang, P., Petrov, S., Jordan, M.I., Klein, D.: The infinite pcfg using hierarchical dirichlet processes. In: Proceedings of EMNLP-CoNLL, pp. 688–697 (2007)

    Google Scholar 

  10. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. on Comp. Biol. and Bioinformatics 1(1), 24–45 (2004)

    Article  Google Scholar 

  11. Tu, K., Honavar, V.: Unsupervised learning of probabilistic context-free grammar using iterative biclustering (extended version). Technical Report 572, Computer Science, Iowa State University (2008), http://archives.cs.iastate.edu/

  12. Stolcke, A.: Boogie (1993), ftp://ftp.icsi.berkeley.edu/pub/ai/stolcke/software/boogie.shar.z

  13. Baker, J.K.: Trainable grammars for speech recognition. In: Speech Communication Papers for the 97th Meeting of the Acoustical Society of America (1979)

    Google Scholar 

  14. Lari, K., Young, S.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–36 (1990)

    Article  Google Scholar 

  15. Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of ACL (2004)

    Google Scholar 

  16. Bod, R.: An all-subtrees approach to unsupervised parsing. In: Proceedings of ACL (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Clark François Coste Laurent Miclet

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tu, K., Honavar, V. (2008). Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering. In: Clark, A., Coste, F., Miclet, L. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2008. Lecture Notes in Computer Science(), vol 5278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88009-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88009-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88008-0

  • Online ISBN: 978-3-540-88009-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics