Abstract
A classical problem in grammatical inference is to identify a language from a set of examples. In this paper, we address the problem of identifying a union of languages from examples that belong to several different unknown languages. Indeed, decomposing a language into smaller pieces that are easier to represent should make learning easier than aiming for a too generalized language. In particular, we consider k-testable languages in the strict sense (k-TSS). These are defined by a set of allowed prefixes, infixes (sub-strings) and suffixes that words in the language may contain. We establish a Galois connection between the lattice of all languages over alphabet \(\varSigma \), and the lattice of k-TSS languages over \(\varSigma \). We also define a simple metric on k-TSS languages. The Galois connection and the metric allow us to derive an efficient algorithm to learn the union of k-TSS languages. We evaluate our algorithm on an industrial dataset and thus demonstrate the relevance of our approach.
This research is supported by the Dutch Technology Foundation (STW) under the Robust CPS program (project 12693).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For missing proofs, see http://arxiv.org/abs/1812.08269.
- 2.
References
Benzécri, J.P.: Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Les cahiers de l’analyse des données 7(2), 209–218 (1982)
Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 115–126 (2006)
Coste, F.: Learning the language of biological sequences. In: Heinz, J., Sempere, J.M. (eds.) Topics in Grammatical Inference, pp. 215–247. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48395-4_8
García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(9), 920–925 (1990)
Garcia, P., Vidal, E., Oncina, J.: Learning locally testable languages in the strict sense. In: First International Workshop Algorithmic Learning Theory (ALT), pp. 325–338 (1990)
Gold, M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, Cambridge (2010)
Linard, A.: Learning several languages from labeled strings: state merging and evolutionary approaches. arXiv preprint arXiv:1806.01630 (2018)
Linard, A., Smetsers, R., Vaandrager, F., Waqas, U., van Pinxten, J., Verwer, S.: Learning pairwise disjoint simple languages from positive examples. arXiv preprint arXiv:1706.01663 (2017)
McNaughton, R., Papert, S.A.: Counter-Free Automata (M.I.T. Research Monograph No. 65). The MIT Press (1971)
Nielson, F., Nielson, H., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-662-03811-6
Rogers, J., Pullum, G.K.: Aural pattern recognition experiments and the subregular hierarchy. J. Log. Lang. Inf. 20(3), 329–342 (2011)
Tantini, F., Terlutte, A., Torre, F.: Sequences classification by least general generalisations. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS (LNAI), vol. 6339, pp. 189–202. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15488-1_16
Torres, I., Varona, A.: k-TSS language models in speech recognition systems. Comput. Speech Lang. 15(2), 127–148 (2001)
Umar, W., et al.: A fast estimator of performance with respect to the design parameters of self re-entrant flowshops. In: Euromicro Conference on Digital System Design, pp. 215–221 (2016)
Yokomori, T., Kobayashi, S.: Learning local languages and their application to dna sequence analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(10), 1067–1079 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Linard, A., de la Higuera, C., Vaandrager, F. (2019). Learning Unions of k-Testable Languages. In: Martín-Vide, C., Okhotin, A., Shapira, D. (eds) Language and Automata Theory and Applications. LATA 2019. Lecture Notes in Computer Science(), vol 11417. Springer, Cham. https://doi.org/10.1007/978-3-030-13435-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-13435-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13434-1
Online ISBN: 978-3-030-13435-8
eBook Packages: Computer ScienceComputer Science (R0)