Skip to main content
Log in

Noise-tolerant efficient inductive synthesis of regular expressions from good examples

  • Special Issue
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

We present an almost linear time method of inductive synthesis restoring simple regular expressions from one representative (good) example. In particular, we consider synthesis of expressions of star-height one, where we allow one union operation under each iteration, and synthesis of expressions without union operations from examples that may contain mistakes. In both cases we provide sufficient conditions defining precisely the class of target expressions and the notion of good examples under which the synthesis algorithm works correctly, and present the proof of correctness. In the case of expressions with unions the proof is based on novel results in the combinatorics of words. A generalized algorithm that can synthesize simple expressions containing unions from noisy examples is implemented as a computer program. Computer experiments show that the algorithm is quite practical and may have applications in genome informatics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Angluin, D., “A Note on the Number of Queries to Identify Regular Languages,”Information and Computation, 51, pp. 76–87, 1981.

    MATH  MathSciNet  Google Scholar 

  2. Angluin, D., “Learning Regular Sets from Queries and Counterexamples,”Information and Computation, 75, 2, pp. 87–106, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  3. Angluin, D., “Inference of Reversible Languages,”J. ACM, 29, pp. 741–765, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  4. Angluin, D. and Laird, P., “Learning from Noisy Examples,”Machine Learning, 2, pp. 343–370, 1988.

    Google Scholar 

  5. Aho, A., “Pattern Matching in Strings,” inFormal Language Theory (R. Book, ed.), Academic Press, 1980.

  6. Barzdin, J., “Some Rules of Inductive Inference and Their Use for Program Synthesis,” inProc. of IFIP’ 83, North-Holland, pp. 333–338, 1983.

  7. Brazma, A., “Inductive Synthesis of Dot Expressions,”Lecture Notes in Computer Science, 502, pp. 156–212, 1991.

    Article  MathSciNet  Google Scholar 

  8. Brazma, A., “Learning a Subclass of Regular Exprssions by Recognizing Periodic Repetitions,” inProc. of the Fourth Scandinavian Conference on AI, IOS Press, the Netherlands, pp. 236–242, 1993.

    Google Scholar 

  9. Brazma, A., “Efficient Identification of Regular Expressions from Representative Examples,” inProc. of the Sixth Annual Workshop on Computational Learning Theory COLT’ 93, ACM press, pp. 236–242, 1993.

  10. Brazma, A., “An Algorithm for Finding Approximate Regular Expressions in Biosequences,” inopen poster section of the Third Annual Conference: Intelligent Systems for Molecular Biology, Robinson College, Cambridge, p. 5, 1995.

  11. Brazma, A. and Cerans, K., “Efficient Learning of Regular Expressions from Good Examples,”Technical Report, LU-IMCS-TR-CS-94-1, Riga, 1994.

  12. Constable, R. L., “The Role of Finite Automata in the Development of Modern Computing Theory,” inProc. of the Kleene Symposium, North-Holland, pp. 61–83, 1980.

  13. Feller, W.An Introduction to Probability Theory and Its Applications, third edition, John Wiley & Sons, 1970.

  14. Fischetti, V. A., Landau, G. M., Sellers, P. H., and Schmidt, J. P., “Identifying Periodic Occurrences of a Template with Applications to Protein Structure,”Proc. Letters, 45, pp. 11–18, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  15. Gold, E. M., “Language Identification in the Limit,”Inf. and Control, 10, pp. 447–474, 1967.

    Article  MATH  Google Scholar 

  16. Freivalds, R., Kinber, E., and Wiehagen, R., “Inductive Inference from Good Examples,”Lecture Notes in Artificial Intelligence, 397, pp. 1–18, 1989.

    MathSciNet  Google Scholar 

  17. Kearns, M. and Valiant, L., “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” inProc. of the 1988 Workshop on Computational Learning Theory, Morgan Kaufman, pp. 359–370, 1988.

  18. Kearns, M. and Li, M., “Learning in the Presence of Malicious Errors,” inProc. of the 20-th Annual Symposium on Theory of Computing, Chicago, Illinois, ACM Press, 1988.

  19. Kinber, E., “Learning a Class of Regular Expressions via Restricted Subset Queries,”Lecture Notes in Artificial Intelligence, 642, pp. 232–243, 1992.

    Google Scholar 

  20. Konagaya, A., “A Stochastic Approach to Genetic Information,” inProc. of the 3-rd Workshop on Algorithmic Learning Theory ALT’92, JSAI, pp. 25–36, 1992.

  21. Li, M. and Vitanyi, P. M. B., “Kolmogorov Complexity and Its Applications,” inHandbook of Theoretical Computer Science, Volume A, Elsevier, p. 201, 1990.

  22. Lyndon, R. C. and Schutzenberger, M. P., “The Equationa m=bncp in a Free Group,”Michigan Math. Journal, 9, pp. 289–298, 1962.

    Article  MATH  MathSciNet  Google Scholar 

  23. Miyano, S., “Learning Theory Toward Genome Informatics,” inProc. of the 4-th Workshop on Algorithmic Learning Theory ALT’93, Lecture Notes in Artificial Intelligence, Springer, pp. 19–36, 1993.

  24. Muggleton, S.,Inductive Acquisition of Expert Knowledge, Turing Institute Press, 1990.

  25. Myers, E. and Miller, W., “Approximate Matching of Regular Expressions,”Bulletin of Mathematical Biology, 51, 1, pp. 5–37, 1989.

    MATH  MathSciNet  Google Scholar 

  26. Sloan, R., “Type of Noise in Data for Concept Learning,” inProc. of 1988 Workshop on Computational Learning Theory, Morgan Kaufman, pp. 91–96, 1988.

  27. Pitt, L., “Inductive Inference, DFAs, and Computational Complexity,”Lecture Notes in Artificial Intelligence, 397, pp. 18–44, 1989.

    MathSciNet  Google Scholar 

  28. Tanida, N. and Yokomori T., “Polynomial-Time Identification of Strictly Regular Languages in the Limit,”IEICE Trans. Inf. & Syst. VE75-D, pp. 125–132, 1992.

  29. Wagner, R. A. and Seiferas, J. I., “Correcting Counter-Automaton-Recognizable Languages,”SIAM Journal on Computing, 7, pp. 357–375, 1978.

    Article  MATH  MathSciNet  Google Scholar 

  30. Wiehagen, R., “From Inductive Inference to Algorithmic Learning,” inProc. Third Workshop on Algorithmic Learning Theory, ALT’92, Sawado, pp. 13–24, 1992.

  31. Yamanishi, K., “A Learning Criterion for Stochastic Rules,” inProc. of the 3-rd Workshop on Computational Learning Theory, NY, Morgan Kaufman, pp. 67–81, 1990.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Alvis Br−azma: He received the Ph.D. degree in computer science from Moscow State University in 1988, and the M. S. degree in mathematics from the University of Latvia in 1982. He has worked at the University of Latvia, at New Mexico State University, and at Helsinki University. Since 1995 he is an Associate Professor at the University of Latvia. He has publications on program synthesis, machine learning, string algorithms, graph drawing, and computational biology. His current scientific interests are machine learning applications in computational biology including automatic pattern discovery in biological data.

K−arlis Čer−ans: He received the Dr.sc.comp. degree from the University of Latvia in 1992. Since 1995 he is an Associate Professor at the University of Latvia. His main research interests are in programming theory: specification and analysis of real time systems, and algorithmic analysis problems for exteded automata (infinite state systems). He has publications also in the fields of inductive inference of recursive functions, and machine learning.

About this article

Cite this article

Br−azma, A., Čer−ans, K. Noise-tolerant efficient inductive synthesis of regular expressions from good examples. New Gener Comput 15, 105–140 (1997). https://doi.org/10.1007/BF03037562

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03037562

Keywords

Navigation