Noise-tolerant efficient inductive synthesis of regular expressions from good examples

Br−azma, Alvis; Čer−ans, K−arlis

doi:10.1007/BF03037562

Noise-tolerant efficient inductive synthesis of regular expressions from good examples

Special Issue
Published: March 1997

Volume 15, pages 105–140, (1997)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Alvis Br−azma¹ &
K−arlis Čer−ans¹

45 Accesses
1 Citation
Explore all metrics

Abstract

We present an almost linear time method of inductive synthesis restoring simple regular expressions from one representative (good) example. In particular, we consider synthesis of expressions of star-height one, where we allow one union operation under each iteration, and synthesis of expressions without union operations from examples that may contain mistakes. In both cases we provide sufficient conditions defining precisely the class of target expressions and the notion of good examples under which the synthesis algorithm works correctly, and present the proof of correctness. In the case of expressions with unions the proof is based on novel results in the combinatorics of words. A generalized algorithm that can synthesize simple expressions containing unions from noisy examples is implemented as a computer program. Computer experiments show that the algorithm is quite practical and may have applications in genome informatics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Enumeration of Regular Expressions for Faster Regular Expression Synthesis

A Refinement Based Algorithm for Learning Program Input Grammars

Efficient Synthesis with Probabilistic Constraints

References

Angluin, D., “A Note on the Number of Queries to Identify Regular Languages,”Information and Computation, 51, pp. 76–87, 1981.
MATH MathSciNet Google Scholar
Angluin, D., “Learning Regular Sets from Queries and Counterexamples,”Information and Computation, 75, 2, pp. 87–106, 1987.
Article MATH MathSciNet Google Scholar
Angluin, D., “Inference of Reversible Languages,”J. ACM, 29, pp. 741–765, 1982.
Article MATH MathSciNet Google Scholar
Angluin, D. and Laird, P., “Learning from Noisy Examples,”Machine Learning, 2, pp. 343–370, 1988.
Google Scholar
Aho, A., “Pattern Matching in Strings,” inFormal Language Theory (R. Book, ed.), Academic Press, 1980.
Barzdin, J., “Some Rules of Inductive Inference and Their Use for Program Synthesis,” inProc. of IFIP’ 83, North-Holland, pp. 333–338, 1983.
Brazma, A., “Inductive Synthesis of Dot Expressions,”Lecture Notes in Computer Science, 502, pp. 156–212, 1991.
Article MathSciNet Google Scholar
Brazma, A., “Learning a Subclass of Regular Exprssions by Recognizing Periodic Repetitions,” inProc. of the Fourth Scandinavian Conference on AI, IOS Press, the Netherlands, pp. 236–242, 1993.
Google Scholar
Brazma, A., “Efficient Identification of Regular Expressions from Representative Examples,” inProc. of the Sixth Annual Workshop on Computational Learning Theory COLT’ 93, ACM press, pp. 236–242, 1993.
Brazma, A., “An Algorithm for Finding Approximate Regular Expressions in Biosequences,” inopen poster section of the Third Annual Conference: Intelligent Systems for Molecular Biology, Robinson College, Cambridge, p. 5, 1995.
Brazma, A. and Cerans, K., “Efficient Learning of Regular Expressions from Good Examples,”Technical Report, LU-IMCS-TR-CS-94-1, Riga, 1994.
Constable, R. L., “The Role of Finite Automata in the Development of Modern Computing Theory,” inProc. of the Kleene Symposium, North-Holland, pp. 61–83, 1980.
Feller, W.An Introduction to Probability Theory and Its Applications, third edition, John Wiley & Sons, 1970.
Fischetti, V. A., Landau, G. M., Sellers, P. H., and Schmidt, J. P., “Identifying Periodic Occurrences of a Template with Applications to Protein Structure,”Proc. Letters, 45, pp. 11–18, 1993.
Article MATH MathSciNet Google Scholar
Gold, E. M., “Language Identification in the Limit,”Inf. and Control, 10, pp. 447–474, 1967.
Article MATH Google Scholar
Freivalds, R., Kinber, E., and Wiehagen, R., “Inductive Inference from Good Examples,”Lecture Notes in Artificial Intelligence, 397, pp. 1–18, 1989.
MathSciNet Google Scholar
Kearns, M. and Valiant, L., “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” inProc. of the 1988 Workshop on Computational Learning Theory, Morgan Kaufman, pp. 359–370, 1988.
Kearns, M. and Li, M., “Learning in the Presence of Malicious Errors,” inProc. of the 20-th Annual Symposium on Theory of Computing, Chicago, Illinois, ACM Press, 1988.
Kinber, E., “Learning a Class of Regular Expressions via Restricted Subset Queries,”Lecture Notes in Artificial Intelligence, 642, pp. 232–243, 1992.
Google Scholar
Konagaya, A., “A Stochastic Approach to Genetic Information,” inProc. of the 3-rd Workshop on Algorithmic Learning Theory ALT’92, JSAI, pp. 25–36, 1992.
Li, M. and Vitanyi, P. M. B., “Kolmogorov Complexity and Its Applications,” inHandbook of Theoretical Computer Science, Volume A, Elsevier, p. 201, 1990.
Lyndon, R. C. and Schutzenberger, M. P., “The Equationa ^m=bⁿc^p in a Free Group,”Michigan Math. Journal, 9, pp. 289–298, 1962.
Article MATH MathSciNet Google Scholar
Miyano, S., “Learning Theory Toward Genome Informatics,” inProc. of the 4-th Workshop on Algorithmic Learning Theory ALT’93, Lecture Notes in Artificial Intelligence, Springer, pp. 19–36, 1993.
Muggleton, S.,Inductive Acquisition of Expert Knowledge, Turing Institute Press, 1990.
Myers, E. and Miller, W., “Approximate Matching of Regular Expressions,”Bulletin of Mathematical Biology, 51, 1, pp. 5–37, 1989.
MATH MathSciNet Google Scholar
Sloan, R., “Type of Noise in Data for Concept Learning,” inProc. of 1988 Workshop on Computational Learning Theory, Morgan Kaufman, pp. 91–96, 1988.
Pitt, L., “Inductive Inference, DFAs, and Computational Complexity,”Lecture Notes in Artificial Intelligence, 397, pp. 18–44, 1989.
MathSciNet Google Scholar
Tanida, N. and Yokomori T., “Polynomial-Time Identification of Strictly Regular Languages in the Limit,”IEICE Trans. Inf. & Syst. VE75-D, pp. 125–132, 1992.
Wagner, R. A. and Seiferas, J. I., “Correcting Counter-Automaton-Recognizable Languages,”SIAM Journal on Computing, 7, pp. 357–375, 1978.
Article MATH MathSciNet Google Scholar
Wiehagen, R., “From Inductive Inference to Algorithmic Learning,” inProc. Third Workshop on Algorithmic Learning Theory, ALT’92, Sawado, pp. 13–24, 1992.
Yamanishi, K., “A Learning Criterion for Stochastic Rules,” inProc. of the 3-rd Workshop on Computational Learning Theory, NY, Morgan Kaufman, pp. 67–81, 1990.

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Blvd., LV-1459, Riga, Latvia
Alvis Br−azma & K−arlis Čer−ans

Authors

Alvis Br−azma
View author publications
You can also search for this author in PubMed Google Scholar
K−arlis Čer−ans
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Alvis Br−azma: He received the Ph.D. degree in computer science from Moscow State University in 1988, and the M. S. degree in mathematics from the University of Latvia in 1982. He has worked at the University of Latvia, at New Mexico State University, and at Helsinki University. Since 1995 he is an Associate Professor at the University of Latvia. He has publications on program synthesis, machine learning, string algorithms, graph drawing, and computational biology. His current scientific interests are machine learning applications in computational biology including automatic pattern discovery in biological data.

K−arlis Čer−ans: He received the Dr.sc.comp. degree from the University of Latvia in 1992. Since 1995 he is an Associate Professor at the University of Latvia. His main research interests are in programming theory: specification and analysis of real time systems, and algorithmic analysis problems for exteded automata (infinite state systems). He has publications also in the fields of inductive inference of recursive functions, and machine learning.

About this article

Cite this article

Br−azma, A., Čer−ans, K. Noise-tolerant efficient inductive synthesis of regular expressions from good examples. New Gener Comput 15, 105–140 (1997). https://doi.org/10.1007/BF03037562

Download citation

Received: 02 May 1995
Revised: 17 April 1996
Issue Date: March 1997
DOI: https://doi.org/10.1007/BF03037562

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise-tolerant efficient inductive synthesis of regular expressions from good examples

Abstract

Access this article

Similar content being viewed by others

Efficient Enumeration of Regular Expressions for Faster Regular Expression Synthesis

A Refinement Based Algorithm for Learning Program Input Grammars

Efficient Synthesis with Probabilistic Constraints

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

Noise-tolerant efficient inductive synthesis of regular expressions from good examples

Abstract

Access this article

Similar content being viewed by others

Efficient Enumeration of Regular Expressions for Faster Regular Expression Synthesis

A Refinement Based Algorithm for Learning Program Input Grammars

Efficient Synthesis with Probabilistic Constraints

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation