Abstract
Learnability is a vital property of formal grammars: representation classes should be defined in such a way that they are learnable. One way to build learnable representations is by making them objective or empiricist: the structure of the representation should be based on the structure of the language. Rather than defining a function from representation to language we should start by defining a function from the language to the representation: following this strategy gives classes of representations that are easy to learn. We illustrate this approach with three classes, defined in analogy to the lowest three levels of the Chomsky hierarchy. First, we recall the canonical deterministic finite automaton, where the states of the automaton correspond to the right congruence classes of the language. Secondly, we define context free grammars where the non-terminals of the grammar correspond to the syntactic congruence classes, and where the productions are defined by the syntactic monoid; finally we define a residuated lattice structure from the Galois connection between strings and contexts, which we call the syntactic concept lattice, and base a representation on this, which allows us to define a class of languages that includes some non-context free languages, many context-free languages and all regular languages. All three classes are efficiently learnable under suitable learning paradigms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angluin, D.: Inference of reversible languages. Communications of the ACM 29, 741–765 (1982)
Angluin, D., Kharitonov, M.: When won’t membership queries help? J. Comput. Syst. Sci. 50, 336–355 (1995)
Boullier, P.: Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars. In: Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 8–12 (1999)
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. Theoretical Informatics and Applications 33(1), 1–20 (1999)
Chomsky, N.: The Minimalist Program. MIT Press, Cambridge (1995)
Chomsky, N.: Language and mind, 3rd edn. Cambridge Univ. Pr., Cambridge (2006)
Clark, A.: PAC-learning unambiguous NTS languages. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 59–71. Springer, Heidelberg (2006)
Clark, A.: A learnable representation for syntax using residuated lattices. In: Proceedings of the 14th Conference on Formal Grammar, Bordeaux, France (2009)
Clark, A., Eyraud, R.: Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research 8, 1725–1745 (2007)
Clark, A., Eyraud, R., Habrard, A.: A polynomial algorithm for the inference of context free languages. In: Clark, A., Coste, F., Miclet, L. (eds.) ICGI 2008. LNCS (LNAI), vol. 5278, pp. 29–42. Springer, Heidelberg (2008)
Clark, A., Thollard, F.: PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research 5, 473–497 (2004)
Conway, J.: Regular algebra and finite machines. Chapman and Hall, London (1971)
Drášil, M.: A grammatical inference for C-finite languages. Archivum Mathematicum 25(2), 163–173 (1989)
Evans, R., Gazdar, G.: DATR: A language for lexical knowledge representation. Computational Linguistics 22(2), 167–216 (1996)
Fernau, H., de la Higuera, C.: Grammar induction: An invitation for formal language theorists. Grammars 7, 45–55 (2004)
Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37(3), 302–320 (1978)
Harris, Z.: Distributional structure. In: Fodor, J.A., Katz, J.J. (eds.) The Structure of Language, pp. 33–49. Prentice-Hall, Englewood Cliffs (1954)
Harrison, M.A.: Introduction to Formal Language Theory. Addison Wesley, Reading (1978)
Holzer, M., Konig, B.: On deterministic finite automata and syntactic monoid size. In: Proc. Developments in Language Theory 2002 (2002)
KřÞ, B.: Generalized grammatical categories in the sense of Kunze. Archivum Mathematicum 17(3), 151–158 (1981)
Kulagina, O.S.: One method of defining grammatical concepts on the basis of set theory. Problemy Kiberneticy 1, 203–214 (1958) (in Russian)
Kunze, J.: Versuch eines objektivierten Grammatikmodells I, II. Z. Zeitschriff Phonetik Sprachwiss. Kommunikat, 20-21 (1967–1968)
Lambek, J.: The mathematics of sentence structure. American Mathematical Monthly 65(3), 154–170 (1958)
Lombardy, S., Sakarovitch, J.: The universal automaton. In: Grädel, E., Flum, J., Wilke, T. (eds.) Logic and Automata: History and Perspectives, pp. 457–494. Amsterdam Univ. Pr. (2008)
Martinek, P.: On a Construction of Context-free Grammars. Fundamenta Informaticae 44(3), 245–264 (2000)
Novotny, M.: On some constructions of grammars for linear languages. International Journal of Computer Mathematics 17(1), 65–77 (1985)
Okhotin, A.: Conjunctive grammars. Journal of Automata, Languages and Combinatorics 6(4), 519–535 (2001)
Păun, G.: Marcus contextual grammars. Kluwer Academic Pub., Dordrecht (1997)
Pollard, C., Sag, I.: Head Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994)
Sénizergues, G.: The equivalence and inclusion problems for NTS languages. J. Comput. Syst. Sci. 31(3), 303–331 (1985)
Sestier, A.: Contribution à une théorie ensembliste des classifications linguistiques. In: Premier Congrès de l’Association Française de Calcul, Grenoble, pp. 293–305 (1960)
Shieber, S.: Evidence against the context-freeness of natural language. Linguistics and Philosophy 8, 333–343 (1985)
Shirakawa, H., Yokomori, T.: Polynomial-time MAT Learning of C-Deterministic Context-free Grammars. Transactions of the information processing society of Japan 34, 380–390 (1993)
Yoshinaka, R.: Learning mildly context-sensitive languages with multidimensional substitutability from positive data. In: Gavaldà , R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS, vol. 5809, pp. 278–292. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Clark, A. (2010). Three Learnable Models for the Description of Language. In: Dediu, AH., Fernau, H., MartÃn-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-13089-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13088-5
Online ISBN: 978-3-642-13089-2
eBook Packages: Computer ScienceComputer Science (R0)