Skip to main content
Log in

A new fast fuzzy Cocke–Younger–Kasami algorithm for DNA strings analysis

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In this paper we present a variation of the Cocke–Younger–Kasami algorithm (CYK algorithm) for the analysis of fuzzy free context languages applied to DNA strings. We propose a variation of the original CYK algorithm where we prove that the computational order of the new CYK algorithm is O(n). We prove that the new algorithm only uses O(2n) memory locations. The fuzzy context-free grammar (FCFG) is obtained from the DNA. The algorithm can be used to find regulatory motifs among other applications. In order to demonstrate the applications of the proposed algorithm, we present two examples. In the first example, we prove that it is possible to define a fuzzy grammar for a prototype DNA sequence and then find the membership grade of any arbitrary sequence against this specific pattern. As a second example, we construct a fuzzy grammar from the alignment of promoters obtained by a logo sequence algorithm for the Escherichia coli K12 DNA string, and then show how the proposed method can be used for discovery of the regulatory motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Asveld PRJ (2005) Fuzzy context-free languages-part 2: recognition and parsing algorithms. Theor Comput Sci 347:191–213

    Article  MathSciNet  MATH  Google Scholar 

  2. Brendel V, Busse H (1984) Genome structure described by formal languages. Nucleic Acids Res 12:2561–2568

    Article  Google Scholar 

  3. Collado-Vides J (1989) A transformational grammar approach to the study of the regulation of gene expression. J Theory Biol 136:403–425

    Article  Google Scholar 

  4. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

  5. Hawkins J, Boden M (2005) The applicability of recurrent neural networks for biological sequence analysis. IEEE/ACM Trans Comput Biol Bioinform 2:243–253

    Article  Google Scholar 

  6. Head T (1987) Formal languages theory and DNA. Bull Math Biol 49

  7. Hopcroft JE, Rajeev Motwai R, Ullman JD (2002) Introduction to automata theory, languages and computation. Addison-Wesley, Reading

  8. Jang J-S, Sun CT, Mitzutani E (1997) Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall, Englewood Cliffs

  9. Jones NC, Pevzner PA (2004) An introduction to bioinformatics algorithms. MIT Press, Cambridge

  10. Koski T (2001) Hidden Markov models for bioinformatics. Kluwer Academic Publishers, Dordrecht

  11. Lee ET, Zadeh LA (1969) Note on fuzzy languages. Inf Sci 1:421–434

    Article  MathSciNet  Google Scholar 

  12. Linz P (2006) Formal languages and automata, 4th edn. Jones and Bartlett Publishers, Sudbury

  13. Molina-Lozano H (2010) A fast fuzzy Cocke–Younger–Kasami algorithm for DNA and RNA string analysis. In: Mexican International Conference on Artificial Intelligence, MICAI

  14. Molina-Lozano H, Vallejo-Clemente E, Morett-Sánchez J (2008) DNA sequence analysis using fuzzy grammars. In: IEEE World Congress on Computational Intelligence

  15. Mordeson JN, Malik DS (2002) Fuzzy automata and languages: theory and applications. Chapman and Hall/CRC, Boca Raton

  16. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 44:443–453

    Article  Google Scholar 

  17. Searls D (1992) The linguistics of DNA. Am Sci 80:579–591

    Google Scholar 

  18. Searls D (1993) Artificial intelligence and molecular biology. In: Hunter L (eds). AAAI Press, pp 47–120

  19. Searls DB (2002) The languages of genes. Nature 420:211–217

    Google Scholar 

  20. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  21. Database of Protein Domains, Families and Functional Sites. Available at: http://www.expasy.ch/prosite

  22. Harrison MA (1978) Introduction to formal language theory. Addison-Wesley, Reading

  23. Schneider TD (1996) New approaches in mathematical biology: information theory and molecular machines. In: Chela-Flores J, Raulin F (eds) Chemical evolution: physics of the origin and evolution of life. Kluwer Academic Publishers, Dordrecht, pp 313–321

Download references

Acknowledgments

This work was supported by the Instituto de Ciencia y Tecnologia del Distrito Federal (ICyTDF) under project No. PICCT08-22. We also thank the support of the IPN (SIP-IPN, COFFA-IPN and PIFI-IPN). Any opinions, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herón Molina-Lozano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Molina-Lozano, H. A new fast fuzzy Cocke–Younger–Kasami algorithm for DNA strings analysis. Int. J. Mach. Learn. & Cyber. 2, 209–218 (2011). https://doi.org/10.1007/s13042-011-0042-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-011-0042-z

Keywords

Navigation