Efficient algorithm for learning simple regular expressions from noisy examples

Brāzma, Alvis

doi:10.1007/3-540-58520-6_69

Alvis Brāzma¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 872))

Included in the following conference series:

171 Accesses
3 Citations

Abstract

We present an efficient algorithm for finding approximate repetitions in a given sequence of characters. First, we define a class of simple regular expressions which are of star-height one and do not contain union operations, and a stochastic mutation process of a given length over a string of characters. Then, assuming that a given string of characters is obtained corrupted by the defined mutation process from some long enough word generated by a simple regular expression, we try to restore the expression. We prove that to within some reasonable accuracy it is always possible if the length of the mutation process is bounded comparing to the length of the example. We provide an algorithm by which the expression can be restored in linear time in the length of the example and no worse than quadratic in the length of the expression. We discuss some extensions of the method and possible applications to bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Angluin. Inference of reversible languages. Journal of the ACM, 29(3):741–765, 1982.
Google Scholar
D. Angluin, P. Laird. Learning from noisy examples. Machine Learning, V2, 1988, 343–370
Google Scholar
A. Aho.“Pattern Matching in Strings.” In Formal Language Theory, R. Book (Ed.), New York: Academic Press.
Google Scholar
A. Brazma. Learning a subclass of regular expressions by recognizing periodic repetitions. Proceedings of the Fourth Scandinavian Conference on AI, IOS Press, 137–146, 1993.
Google Scholar
A. Brazma. Efficient identification of regular expressions from representative examples. In Proceedings of Sixth ACM Conference on Computational Learning Theory: COLT'93, ACM Press, 1993, 236–242.
Google Scholar
A. Brazma, K. Cerans. Efficient Learning of Regular Expressions from Good Examples. Technical Report, LU-IMSC-TR-CS-94-1, University of Latvia, Riga, 1994 (also to appear in proceedings of AIP94).
Google Scholar
C. DeLisi, Computers in molecular biology: current applications and emerging trends. Science, V. 240, April 1988, 47–51
Google Scholar
M. Kearns, M. Li., Learning in the presence of malicious errors. In Proc. of the 20-th Annual Symposium on Theory of Computing, Chicago, Illinois, May 1988.
Google Scholar
R.C. Lyndon, M.P. Schutzenberg. The equation a ^M =b ^Nc^P in a free group. Michigan Math. J. V9, 289–298, 1962.
Google Scholar
E. Myers, W. Miller. Approximate matching of regular expression. Bulletin of Mathematical Biology, V. 51, N.1, 5–37, 1989.
Google Scholar
A. Konagaya. A Stochastic Approach to Genetic Information. In Proc. of the 3-rd Workshop on Algorithmic Learning Theory ALT'92, JSAI, 25–36, 1992.
Google Scholar
S. Miyano. Learning Theory Toward Genome Informatics. In Proc. of the 4-th Workshop on Algorithmic Learning Theory ALT'93, Lect. Notes in Artific. Int., Springer, 19–36, 1993.
Google Scholar
M. Singer and P. Berg. Genes and Genomes. University Science Books, Mill Valey, California, 1991.
Google Scholar
R. Sloan. Types of noise in data for concept learning. In Proc. of 1988 Workshop on Computational Learning Theory, Morgan Kaufman, 1988, 91–96.
Google Scholar
N. Tanida, T. Yokomori. Polynomial-time identification of strictly regular languages in the limit. IEICE Trans. Inf. & Syst., V E75-D, 1992, 125–132.
Google Scholar
K. Yamanishi. A learning criterion for stochastic rules. In Proc. of the 3-rd Workshop on Computational Learning Theory, Rochester, NY: Morgan Kaufman, 1990, 67–81.
Google Scholar
R.A. Wagner, J.I. Seiferas. Correcting counter-automaton-recognizable languages. SIAM J. Computing. V 7, 1978, 357–375.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Blvd., LV-1459, Riga, Latvia
Alvis Brāzma

Authors

Alvis Brāzma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Setsuo Arikawa Klaus P. Jantke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brāzma, A. (1994). Efficient algorithm for learning simple regular expressions from noisy examples. In: Arikawa, S., Jantke, K.P. (eds) Algorithmic Learning Theory. AII ALT 1994 1994. Lecture Notes in Computer Science, vol 872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58520-6_69

Download citation

DOI: https://doi.org/10.1007/3-540-58520-6_69
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58520-6
Online ISBN: 978-3-540-49030-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics