A survey of grammatical inference methods for natural language learning

D’Ulizia, Arianna; Ferri, Fernando; Grifoni, Patrizia

doi:10.1007/s10462-010-9199-1

A survey of grammatical inference methods for natural language learning

Published: 06 January 2011

Volume 36, pages 1–27, (2011)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Arianna D’Ulizia¹,
Fernando Ferri¹ &
Patrizia Grifoni¹

787 Accesses
39 Citations
Explore all metrics

Abstract

The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adriaans PW (2001) Learning shallow context-free languages under simple distributions. In: Opestake A, Vermeulen K (eds) Algebras, diagrams and decisions in language, logic and computation, CSLI/CUP
Adriaans PW (1992) Language learning from a categorial perspective. PhD thesis, University of Amsterdam, Amsterdam
Adriaans PW, Vervoort M (2002) The EMILE 4.1 grammar induction toolbox. In: Adriaans P, Fernau H, van Zaanen M (eds) Grammatical inference: algorithms and applications: 6th international colloquium: ICGI 2002. Lecture notes in computer science, vol 2484. Springer, Heidelberg, pp 293–295
Angluin D (1982) Inference of reversible languages. J ACM 29: 741–765
Article MathSciNet MATH Google Scholar
Baker JK (1979) Trainable grammars for speech recognition. In: Klatt DH, Wolf JJ (eds) Speech communication papers for the 97th meeting of the acoustical society of America, pp 547–550
Black E, Abney S, Flickinger D, Gdaniec C, Grishman R, Harrison P, Hindle D, Ingria R, Jelinek F, Klavans J, Liberman M, Marcus M, Roukos S, Santorini B, Strzalkowski T (1991) A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Proceedings of the DARPA speech and natural language workshop, pp 306–311
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with cotraining. In: Proceedings of the workshop on computational learning theory
Bonnema R, Bod R, Scha R (1997) A DOP model for semantic interpretation. In: ACL 1997, pp 159–167
Briscoe T (2000) Grammatical acquisition: inductive bias and coevolution of language and the language acquisition device. Language, pp 245–296
Charniak E, Johnson M (2005) Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd annual meeting of the ACL, Ann Arbor, pp 173–180
Charniak E (1997) Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the fourteenth national conference on artificial intelligence, Menlo Park. AAAI Press/MIT Press
Chomsky N (1957) Syntactic Structures. The Hague Mouton.
Clark A (2001) Unsupervised induction of stochastic context-free grammars using distributional clustering. In: ConLL ‘01: Proceedings of the 2001 workshop on computational natural language learning, Morristown, NJ, USA. Association for Computational Linguistics, pp 1–8
Cramer B (2007) Limitations of current grammar induction algorithms. In: Proceedings of the 45th annual meeting of the ACL: student research workshop, June 25–26, 2007, Prague, Czech Republic
Déjean H (2000) ALLiS: a symbolic learning system for natural language learning. In: Cardie C, Daelemans W, N’edellec C, Tjong Kim Sang E (eds) Proceedings of the fourth conference on computational natural language learning and of the second learning language in logic workshop; Lisbon, Portugal. Held in cooperation with ICGI-2000, pp 95–98
de la Higuera C, Oncina J (2003) Identification with Probability One of Stochastic Deterministic Linear Languages. In: Proceedings of ALT 2003. Springer, Berlin, Heidelberg, pp 134–148
Denis F (1998) Pac learning from positive statistical queries. In: Proceedings of 9th international conference on algorithmic learning theory—ALT ‘98, Springer, pp 112–126
Edelman S, Solan Z, Horn D, Ruppin E (2005) Learning syntactic constructions from raw corpora. In: 29th Boston University conference on language development, Cascadilla Press
Emerald JD, Subramanian KG, Thomas DG (1996) Learning code regular and code linear languages. In: Proceedings of international colloquium on grammatical inference (ICGI-96), lecture notes in artificial intelligence 1147, Springer, pp 211–221
Garcia P, Vidal E (1990) Inference of K-testable languages in the strict sense and applications to syntactic pattern recognition. J IEEE Trans Pattern Anal Mach Intell 12(9): 920–925
Article Google Scholar
Gold EM (1967) Language identification in the limit. Inform Control 10: 447–474
Article MATH Google Scholar
Hänig C, Bordag S, Quasthoff U (2008) UnsuParse: unsupervised parsing with unsupervised part of speech tagging. In: Proceedings of the sixth international language resources and evaluation (LREC 2008)
Hopcroft JE, Ullman JE (1979) Introduction to automata theory, languages, and computation. Addison-Wesley, New York
MATH Google Scholar
Horning JJ (1969) A study of grammatical inference. PhD thesis, Stanford University, Stanford:CA, USA
Kasami T (1965) An efficient recognition and syntax analysis algorithm for context-free languages. Science report, Air Force Cambridge Research Laboratory, Bedford
Koshiba T, Makinen E, Takada Y (1997) Inferring pure context-free languages from positive data. Technical report A-1997-14, Department of Computer Science, University of Tampere
Langley P, Stromsten S (2000) Learning context-free grammars with a simplicity bias. In: Proceedings of the eleventh European conference on machine learning (ECML 2000), lecture notes in artificial intelligence 1810, Springer, pp 220–228
Levenshtein VI (1965) Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSR 163(4): 845–848 (Original in Russian)
MathSciNet Google Scholar
MacWhinney B (1991) The CHILDES project: tools for analyzing talk. Erlbaum, Mahwah
Google Scholar
Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn treebank. Comput Linguist 19(2): 313–330
Google Scholar
McClosky D, Charniak E, Johnson M (2006) Effective self-training for parsing. In: Proceedings of HLT-NAACL 2006
Nakamura K (2003) Incremental learning of context free grammars by extended inductive cyk algorithm. In: Higuera C, Adriaans PW, Zaanen M, Oncina J (eds) ECML workshop on learning contex-free grammars. Ruder Boskovic Institute, Zagreb, pp 53–64
Google Scholar
Nakamura K, Matsumoto M (2002) Incremental learning of context free grammars. In: ICGI ‘02: proceedings of the 6th international colloquium on grammatical inference (London, UK), Springer, pp 174–184
Nakamura K, Ishiwata T (2000) Synthesizing context free grammars from sample strings based on inductive cyk algorithm. In: ICGI ‘00: proceedings of the 5th international colloquium on grammatical inference, London, UK, Springer, pp 186–195
Petasis G, Paliouras G, Karkaletsis V, Halatsis C, Spyropoulos CD (2004) e-GRIDS: computationally efficient grammatical inference from positive examples. GRAMMARS 7: 69–110
Google Scholar
Pullum GK (2003) Learnability. In: The Oxford International Encyclopaedia of Linguistics, 2^nd edn. Oxford, Oxford University Press, pp 431–434
Rissanen J (1982) A universal prior for integers and estimation by minimum description length. Ann Statist 11: 416–431
Article MathSciNet Google Scholar
Roberts A, Atwell E (2002) Unsupervised grammar inference systems for natural language. Research report number 2002.20. School of Computing, University of Leeds
Sakakibara Y (1997) Recent advances of grammatical inference. Theor Comput Sci 185: 15–45
Article MathSciNet MATH Google Scholar
Sakakibara Y, Brown M, Hughley R, Mian I, Sjolander K, Underwood R, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22: 5112–5120
Article Google Scholar
Sakakibara Y, Muramatsu H (2000) Learning context-free grammars from partially structured examples. In: Proceedings of the 5^th international colloquium on grammatical inference: algorithms and applications (ICGI), pp 229–240
Salvador I, Benedı JM (2002) RNA modeling by combining stochastic context-free grammars and n-Gram models. Int J Pattern Recogn Artif Intell 16(3): 309–316
Article Google Scholar
Seginer Y (2007) Fast unsupervised incremental parsing. In: Proceedings of the ACL 2007, Prague
Solan Z, Horn D, Ruppin E, Edelman S (2005) Unsupervised learning of natural languages. Proc Natl Acad Sci USA 102(33): 11629–11634
Article Google Scholar
Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J (2003) Bootstrapping statistical parsers from small datasets. In: Proceedings of the annual meeting of the European chapter of the ACL, Budapest, Hungary
van Zaanen MV (2001) Bootstrapping structure into language: alignment-based learning. PhD thesis, School of Computing, University of Leeds, UK
van Zaanen M, Adriaans P (2001) Alignment-based learning versus EMILE: a comparison. In: Proceedings of the Belgian-Dutch conference on artificial intelligence (BNAIC), Amsterdam, The Netherlands
Watkinson S, Manandhar S (2001) A psychologically plausible and computationally effective approach to learning syntax. In: Proceedings of the workshop computational natural language learning (CoNLL-2001), pp 160–167
Yokomori T (1995) On polynomial-time learnability in the limit of strictly deterministic automata. J Mach Learn 19: 153–179
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Consiglio Nazionale delle Ricerche – Istituto di Ricerche sulla Popolazione e le Politiche Sociali, Via Palestro 32, 00185, Rome, Italy
Arianna D’Ulizia, Fernando Ferri & Patrizia Grifoni

Authors

Arianna D’Ulizia
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Ferri
View author publications
You can also search for this author in PubMed Google Scholar
Patrizia Grifoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrizia Grifoni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

D’Ulizia, A., Ferri, F. & Grifoni, P. A survey of grammatical inference methods for natural language learning. Artif Intell Rev 36, 1–27 (2011). https://doi.org/10.1007/s10462-010-9199-1

Download citation

Published: 06 January 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10462-010-9199-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of grammatical inference methods for natural language learning

Abstract

Access this article

Similar content being viewed by others

A Proposal for Contextual Grammatical Inference

Grammar Induction - Experimental Results

Learning Domain-Specific Grammars from a Small Number of Examples

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of grammatical inference methods for natural language learning

Abstract

Access this article

Similar content being viewed by others

A Proposal for Contextual Grammatical Inference

Grammar Induction - Experimental Results

Learning Domain-Specific Grammars from a Small Number of Examples

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation