Abstract
In this paper we present a variation of the Cocke–Younger–Kasami algorithm (CYK algorithm) for the analysis of fuzzy free context languages applied to DNA strings. We propose a variation of the original CYK algorithm where we prove that the computational order of the new CYK algorithm is O(n). We prove that the new algorithm only uses O(2n) memory locations. The fuzzy context-free grammar (FCFG) is obtained from the DNA. The algorithm can be used to find regulatory motifs among other applications. In order to demonstrate the applications of the proposed algorithm, we present two examples. In the first example, we prove that it is possible to define a fuzzy grammar for a prototype DNA sequence and then find the membership grade of any arbitrary sequence against this specific pattern. As a second example, we construct a fuzzy grammar from the alignment of promoters obtained by a logo sequence algorithm for the Escherichia coli K12 DNA string, and then show how the proposed method can be used for discovery of the regulatory motifs.
Similar content being viewed by others
References
Asveld PRJ (2005) Fuzzy context-free languages-part 2: recognition and parsing algorithms. Theor Comput Sci 347:191–213
Brendel V, Busse H (1984) Genome structure described by formal languages. Nucleic Acids Res 12:2561–2568
Collado-Vides J (1989) A transformational grammar approach to the study of the regulation of gene expression. J Theory Biol 136:403–425
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Hawkins J, Boden M (2005) The applicability of recurrent neural networks for biological sequence analysis. IEEE/ACM Trans Comput Biol Bioinform 2:243–253
Head T (1987) Formal languages theory and DNA. Bull Math Biol 49
Hopcroft JE, Rajeev Motwai R, Ullman JD (2002) Introduction to automata theory, languages and computation. Addison-Wesley, Reading
Jang J-S, Sun CT, Mitzutani E (1997) Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall, Englewood Cliffs
Jones NC, Pevzner PA (2004) An introduction to bioinformatics algorithms. MIT Press, Cambridge
Koski T (2001) Hidden Markov models for bioinformatics. Kluwer Academic Publishers, Dordrecht
Lee ET, Zadeh LA (1969) Note on fuzzy languages. Inf Sci 1:421–434
Linz P (2006) Formal languages and automata, 4th edn. Jones and Bartlett Publishers, Sudbury
Molina-Lozano H (2010) A fast fuzzy Cocke–Younger–Kasami algorithm for DNA and RNA string analysis. In: Mexican International Conference on Artificial Intelligence, MICAI
Molina-Lozano H, Vallejo-Clemente E, Morett-Sánchez J (2008) DNA sequence analysis using fuzzy grammars. In: IEEE World Congress on Computational Intelligence
Mordeson JN, Malik DS (2002) Fuzzy automata and languages: theory and applications. Chapman and Hall/CRC, Boca Raton
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 44:443–453
Searls D (1992) The linguistics of DNA. Am Sci 80:579–591
Searls D (1993) Artificial intelligence and molecular biology. In: Hunter L (eds). AAAI Press, pp 47–120
Searls DB (2002) The languages of genes. Nature 420:211–217
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Database of Protein Domains, Families and Functional Sites. Available at: http://www.expasy.ch/prosite
Harrison MA (1978) Introduction to formal language theory. Addison-Wesley, Reading
Schneider TD (1996) New approaches in mathematical biology: information theory and molecular machines. In: Chela-Flores J, Raulin F (eds) Chemical evolution: physics of the origin and evolution of life. Kluwer Academic Publishers, Dordrecht, pp 313–321
Acknowledgments
This work was supported by the Instituto de Ciencia y Tecnologia del Distrito Federal (ICyTDF) under project No. PICCT08-22. We also thank the support of the IPN (SIP-IPN, COFFA-IPN and PIFI-IPN). Any opinions, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agency.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Molina-Lozano, H. A new fast fuzzy Cocke–Younger–Kasami algorithm for DNA strings analysis. Int. J. Mach. Learn. & Cyber. 2, 209–218 (2011). https://doi.org/10.1007/s13042-011-0042-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0042-z