Skip to main content

Efficient algorithm for learning simple regular expressions from noisy examples

  • Selected Papers
  • Conference paper
  • First Online:
Book cover Algorithmic Learning Theory (AII 1994, ALT 1994)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 872))

Abstract

We present an efficient algorithm for finding approximate repetitions in a given sequence of characters. First, we define a class of simple regular expressions which are of star-height one and do not contain union operations, and a stochastic mutation process of a given length over a string of characters. Then, assuming that a given string of characters is obtained corrupted by the defined mutation process from some long enough word generated by a simple regular expression, we try to restore the expression. We prove that to within some reasonable accuracy it is always possible if the length of the mutation process is bounded comparing to the length of the example. We provide an algorithm by which the expression can be restored in linear time in the length of the example and no worse than quadratic in the length of the expression. We discuss some extensions of the method and possible applications to bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Angluin. Inference of reversible languages. Journal of the ACM, 29(3):741–765, 1982.

    Google Scholar 

  2. D. Angluin, P. Laird. Learning from noisy examples. Machine Learning, V2, 1988, 343–370

    Google Scholar 

  3. A. Aho.“Pattern Matching in Strings.” In Formal Language Theory, R. Book (Ed.), New York: Academic Press.

    Google Scholar 

  4. A. Brazma. Learning a subclass of regular expressions by recognizing periodic repetitions. Proceedings of the Fourth Scandinavian Conference on AI, IOS Press, 137–146, 1993.

    Google Scholar 

  5. A. Brazma. Efficient identification of regular expressions from representative examples. In Proceedings of Sixth ACM Conference on Computational Learning Theory: COLT'93, ACM Press, 1993, 236–242.

    Google Scholar 

  6. A. Brazma, K. Cerans. Efficient Learning of Regular Expressions from Good Examples. Technical Report, LU-IMSC-TR-CS-94-1, University of Latvia, Riga, 1994 (also to appear in proceedings of AIP94).

    Google Scholar 

  7. C. DeLisi, Computers in molecular biology: current applications and emerging trends. Science, V. 240, April 1988, 47–51

    Google Scholar 

  8. M. Kearns, M. Li., Learning in the presence of malicious errors. In Proc. of the 20-th Annual Symposium on Theory of Computing, Chicago, Illinois, May 1988.

    Google Scholar 

  9. R.C. Lyndon, M.P. Schutzenberg. The equation a M =b NcP in a free group. Michigan Math. J. V9, 289–298, 1962.

    Google Scholar 

  10. E. Myers, W. Miller. Approximate matching of regular expression. Bulletin of Mathematical Biology, V. 51, N.1, 5–37, 1989.

    Google Scholar 

  11. A. Konagaya. A Stochastic Approach to Genetic Information. In Proc. of the 3-rd Workshop on Algorithmic Learning Theory ALT'92, JSAI, 25–36, 1992.

    Google Scholar 

  12. S. Miyano. Learning Theory Toward Genome Informatics. In Proc. of the 4-th Workshop on Algorithmic Learning Theory ALT'93, Lect. Notes in Artific. Int., Springer, 19–36, 1993.

    Google Scholar 

  13. M. Singer and P. Berg. Genes and Genomes. University Science Books, Mill Valey, California, 1991.

    Google Scholar 

  14. R. Sloan. Types of noise in data for concept learning. In Proc. of 1988 Workshop on Computational Learning Theory, Morgan Kaufman, 1988, 91–96.

    Google Scholar 

  15. N. Tanida, T. Yokomori. Polynomial-time identification of strictly regular languages in the limit. IEICE Trans. Inf. & Syst., V E75-D, 1992, 125–132.

    Google Scholar 

  16. K. Yamanishi. A learning criterion for stochastic rules. In Proc. of the 3-rd Workshop on Computational Learning Theory, Rochester, NY: Morgan Kaufman, 1990, 67–81.

    Google Scholar 

  17. R.A. Wagner, J.I. Seiferas. Correcting counter-automaton-recognizable languages. SIAM J. Computing. V 7, 1978, 357–375.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Setsuo Arikawa Klaus P. Jantke

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brāzma, A. (1994). Efficient algorithm for learning simple regular expressions from noisy examples. In: Arikawa, S., Jantke, K.P. (eds) Algorithmic Learning Theory. AII ALT 1994 1994. Lecture Notes in Computer Science, vol 872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58520-6_69

Download citation

  • DOI: https://doi.org/10.1007/3-540-58520-6_69

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58520-6

  • Online ISBN: 978-3-540-49030-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics