Abstract
The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.
Work supported by the CICYT TIC2000-1153 and the Generalitat Valenciana GV06/068.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Editorial. The fundamental role of pattern recognition for gene-expresion/micro– array data in bioinformatics. Pattern Recognition 38, 2226–2228 (2005)
Liew, A.W.-C., Yan, H., Yang, M.: Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recognition 38, 2055–2073 (2005)
Searls, D.B.: The language of genes. Nature 420, 211–217 (2002)
Sakakibara, Y.: Grammatical inference in bioinformatics. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1051–1062 (2005)
Yokomori, T., Kobayashi, S.: Learning local languages and their application to dna sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 1067–1079 (1998)
Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., Shinohara, T.: A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains. In: Proceedings of the 25th Hawaii Intl. Conf. on System Sciences. IEEE, Los Alamitos (1992)
Lopez, D., Cano, A., Vazquez de Parga, M., Calles, B., Sempere, J.M., Perez, T., Ruiz, J., Garcia, P.: Detection of functional motifs in biosequences: A grammatical inference approach. In: Proceedings of the 5th Annual Spanish Bioinformatics Conference, pp. 72–75. Univ. Politécnica de Catalunya (2004) ISBN: 84-7653-863-4
López, D., Cano, A., de Parga, M.V., Calles, B., Sempere, J.M., Pérez, T., Campos, M., Ruiz, J., García, P.: Motif discovery by k-tss grammatical inference. In: Paliouras, G., de la Higuera, C., Oates, T., Van Zaanen, M. (eds.) IJCAI-2005 Workshop on Grammatical Inference Applications: Successes and Future Challenges. Working Notes (2005)
Brazma, A., Johansen, I., Vilo, J., Ukkonen, E.: Pattern discovery in biosequences. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 257–270. Springer, Heidelberg (1998)
Arimura, H., Wataki, A., Fujino, R., Arikawa, S.: A fast algorithm for discovery optimal string patterns in large databases. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 247–261. Springer, Heidelberg (1998)
Peris, P., López, D., Campos, M., Sempere, J.M.: Gene-finding by grammatical inference (submitted manuscript)
Skehel, J.J., Wiley, D.C.: Coiled coils in both intracellular vesicle and viral membrane fusion. Cell 95, 871–874 (1998)
Chan, D.C., Kim, P.S.: Hiv entry and its inhibition. Cell 93, 681–684 (1998)
Wolf, E., Kim, P.S., Berger, B.: Multicoil: a program for predicting two- and three-stranded coiled coils. Protein Science 6, 1179–1189 (1997)
Lupas, A., Van Dyke, M., Stock, J.: Predicting coiled coild from protein sequences. Science 252, 1162–1164 (1991)
Berger, B., Wilson, D.B., Wolf, E., Tonchev, T., Milla, M., Kim, P.S.: Predicting coiled coils by use of pairwise residue correlation. Proc. Natl. Acad. Sci. 92, 8259–8263 (1995)
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weakenesses. Nucleic Acid Research 30(19), 4103–4117 (2002)
Singh, M., Berger, B., Kim, P.S.: Learncoil-vmf: Computational evidence for coiled-coil-like motifs in many viral membrane fusion proteins. J. Mol. Biol. 290, 1031–1041 (1999)
Singh, M., Berger, B., Kim, P.S., Berger, J.M., Cochran, A.G.: Computational learning reveals coiled coil-like motifs in histidine kinase linker domains. Proc. Natl. Acad. Sci. 95, 2738–2743 (1998)
Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, pp. 113–122. IEEE, Los Alamitos (1994)
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley Publishing Company, Reading (1979)
Sempere, J.M., García, P.: A characterization of even linear languages and its application to the learning problem. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 38–44. Springer, Heidelberg (1994)
Berstel, J.: Transductions and context-free languages. Teubner Studienbücher (1979)
Delorenzi, M., Speed, T.: An hmm model for coiled-coil domains and a comparison with pssm-based predictions. Bioinformatics 18(4), 617–625 (2002)
Campos, M., López, D.: Neural network approach to locate motifs in biosequences. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 214–221. Springer, Heidelberg (2005)
Knuutila, T.: Inference of k-Testable Tree Languages. In: Advances in Structural and Syntactic Pattern Recognition: Proc. of the International Workshop, pp. 109–120. World Scientific, Singapore (1992)
García, P.: Learning k-testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (1993), Available on: http://www.dsic.upv.es/users/tlcc/tlcc.html
Swiss-Prot groups at SIB and at EBI. Uniprot database (swissprot and trembl), http://www.expasy.ch/sprot/
Protein data bank, http://www.rcsb.org/pdb/Welcome.do
Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)
Source Code NCOILS (1999), http://www.russell.embl.de/cgi-bin/coils-svr.pl
PAIRCOIL implementation by the authors (1995), http://theory.lcs.mit.edu/bab/computing
Sempere, J.M., García, P.: Learning locally testable even linear languages form positive data. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 225–236. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peris, P., López, D., Campos, M., Sempere, J.M. (2006). Protein Motif Prediction by Grammatical Inference. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_15
Download citation
DOI: https://doi.org/10.1007/11872436_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)