Abstract
We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S 1,S 2,...,S r }. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string S i . To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string S i that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S 1,S 2,...,S r }. These common subsequences are the common motifs of the strings.
This research has been partially supported by the Ministry of Education, Youth and Sports under research program MSM 6840770014 and the Czech Science Foundation as project No. 201/06/1039.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Charras, C., Lecroq, T.: Exact string matching algorithms (2004)
Crawford, T., Iliopoulos, C.S., Raman, R.: String matching techniques for musical similarity and melodic recognition. Computing in Musicology 11, 73–100 (1998)
Crochemore, M., Vérin, R.: Direct construction of compact directed acyclic word graphs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 116–129. Springer, Heidelberg (1997)
Crochemore, M., Hancart, C.: Automata for matching patterns. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, Linear Modeling: Background and Application, ch. 9, vol. 2, pp. 399–462. Springer, Heidelberg (1997)
Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press, Inc., New York (1994)
Holub, J., Melichar, B.: Approximate string matching using factor automata. Theor. Comput. Sci. 249(2), 305–311 (2000)
Iliopoulos, C.S., McHugh, J., Peterlongo, P., Pisanti, N., Rytter, W., Sagot, M.: A first approach to finding common motifs with gaps. International Journal of Foundations of Computer Science (2004)
Leung, H.C.M.: Finding motifs with insufficient number of strong binding sites. Journal of Computational Biology 12(6), 686–701 (2005)
Skiena, S.S.: The algorithm design manual. Springer, New York (1998)
Baker, M.E., Bailey, T.L., Elkan, C.P.: An artificial intelligence approach to motif discovery in protein sequences: Application to steroid dehydrogenases. The Journal of Steroid Biochemistry and Molecular Biology 62(1), 29–44 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Antoniou, P., Holub, J., Iliopoulos, C.S., Melichar, B., Peterlongo, P. (2006). Finding Common Motifs with Gaps Using Finite Automata. In: Ibarra, O.H., Yen, HC. (eds) Implementation and Application of Automata. CIAA 2006. Lecture Notes in Computer Science, vol 4094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11812128_8
Download citation
DOI: https://doi.org/10.1007/11812128_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37213-4
Online ISBN: 978-3-540-37214-1
eBook Packages: Computer ScienceComputer Science (R0)