Protein Motif Prediction by Grammatical Inference

Peris, Piedachu; López, Damián; Campos, Marcelino; Sempere, José M.

doi:10.1007/11872436_15

Piedachu Peris²³,
Damián López²³,
Marcelino Campos²³ &
…
José M. Sempere²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4201))

Included in the following conference series:

International Colloquium on Grammatical Inference

601 Accesses
2 Citations

Abstract

The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.

Work supported by the CICYT TIC2000-1153 and the Generalitat Valenciana GV06/068.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Learning the Language of Biological Sequences

Unsupervised Grammar Induction for Revealing the Internal Structure of Protein Sequence Motifs

GP-Based Grammatical Inference for Classification of Amyloidogenic Sequences

References

Editorial. The fundamental role of pattern recognition for gene-expresion/micro– array data in bioinformatics. Pattern Recognition 38, 2226–2228 (2005)
Google Scholar
Liew, A.W.-C., Yan, H., Yang, M.: Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recognition 38, 2055–2073 (2005)
Article Google Scholar
Searls, D.B.: The language of genes. Nature 420, 211–217 (2002)
Article Google Scholar
Sakakibara, Y.: Grammatical inference in bioinformatics. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1051–1062 (2005)
Article Google Scholar
Yokomori, T., Kobayashi, S.: Learning local languages and their application to dna sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 1067–1079 (1998)
Article Google Scholar
Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., Shinohara, T.: A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains. In: Proceedings of the 25th Hawaii Intl. Conf. on System Sciences. IEEE, Los Alamitos (1992)
Google Scholar
Lopez, D., Cano, A., Vazquez de Parga, M., Calles, B., Sempere, J.M., Perez, T., Ruiz, J., Garcia, P.: Detection of functional motifs in biosequences: A grammatical inference approach. In: Proceedings of the 5th Annual Spanish Bioinformatics Conference, pp. 72–75. Univ. Politécnica de Catalunya (2004) ISBN: 84-7653-863-4
Google Scholar
López, D., Cano, A., de Parga, M.V., Calles, B., Sempere, J.M., Pérez, T., Campos, M., Ruiz, J., García, P.: Motif discovery by k-tss grammatical inference. In: Paliouras, G., de la Higuera, C., Oates, T., Van Zaanen, M. (eds.) IJCAI-2005 Workshop on Grammatical Inference Applications: Successes and Future Challenges. Working Notes (2005)
Google Scholar
Brazma, A., Johansen, I., Vilo, J., Ukkonen, E.: Pattern discovery in biosequences. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 257–270. Springer, Heidelberg (1998)
Chapter Google Scholar
Arimura, H., Wataki, A., Fujino, R., Arikawa, S.: A fast algorithm for discovery optimal string patterns in large databases. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 247–261. Springer, Heidelberg (1998)
Chapter Google Scholar
Peris, P., López, D., Campos, M., Sempere, J.M.: Gene-finding by grammatical inference (submitted manuscript)
Google Scholar
Skehel, J.J., Wiley, D.C.: Coiled coils in both intracellular vesicle and viral membrane fusion. Cell 95, 871–874 (1998)
Article Google Scholar
Chan, D.C., Kim, P.S.: Hiv entry and its inhibition. Cell 93, 681–684 (1998)
Article Google Scholar
Wolf, E., Kim, P.S., Berger, B.: Multicoil: a program for predicting two- and three-stranded coiled coils. Protein Science 6, 1179–1189 (1997)
Article Google Scholar
Lupas, A., Van Dyke, M., Stock, J.: Predicting coiled coild from protein sequences. Science 252, 1162–1164 (1991)
Article Google Scholar
Berger, B., Wilson, D.B., Wolf, E., Tonchev, T., Milla, M., Kim, P.S.: Predicting coiled coils by use of pairwise residue correlation. Proc. Natl. Acad. Sci. 92, 8259–8263 (1995)
Article Google Scholar
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weakenesses. Nucleic Acid Research 30(19), 4103–4117 (2002)
Article Google Scholar
Singh, M., Berger, B., Kim, P.S.: Learncoil-vmf: Computational evidence for coiled-coil-like motifs in many viral membrane fusion proteins. J. Mol. Biol. 290, 1031–1041 (1999)
Article Google Scholar
Singh, M., Berger, B., Kim, P.S., Berger, J.M., Cochran, A.G.: Computational learning reveals coiled coil-like motifs in histidine kinase linker domains. Proc. Natl. Acad. Sci. 95, 2738–2743 (1998)
Article Google Scholar
Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, pp. 113–122. IEEE, Los Alamitos (1994)
Chapter Google Scholar
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley Publishing Company, Reading (1979)
MATH Google Scholar
Sempere, J.M., García, P.: A characterization of even linear languages and its application to the learning problem. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 38–44. Springer, Heidelberg (1994)
Google Scholar
Berstel, J.: Transductions and context-free languages. Teubner Studienbücher (1979)
Google Scholar
Delorenzi, M., Speed, T.: An hmm model for coiled-coil domains and a comparison with pssm-based predictions. Bioinformatics 18(4), 617–625 (2002)
Article Google Scholar
Campos, M., López, D.: Neural network approach to locate motifs in biosequences. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 214–221. Springer, Heidelberg (2005)
Chapter Google Scholar
Knuutila, T.: Inference of k-Testable Tree Languages. In: Advances in Structural and Syntactic Pattern Recognition: Proc. of the International Workshop, pp. 109–120. World Scientific, Singapore (1992)
Chapter Google Scholar
García, P.: Learning k-testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (1993), Available on: http://www.dsic.upv.es/users/tlcc/tlcc.html
Swiss-Prot groups at SIB and at EBI. Uniprot database (swissprot and trembl), http://www.expasy.ch/sprot/
Protein data bank, http://www.rcsb.org/pdb/Welcome.do
Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)
Article Google Scholar
Source Code NCOILS (1999), http://www.russell.embl.de/cgi-bin/coils-svr.pl
PAIRCOIL implementation by the authors (1995), http://theory.lcs.mit.edu/bab/computing
Sempere, J.M., García, P.: Learning locally testable even linear languages form positive data. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 225–236. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Sistemas Informáticos y Computación., Universidad Politécnica de Valencia., Camino de Vera s/n, 46071, Valencia, SPAIN
Piedachu Peris, Damián López, Marcelino Campos & José M. Sempere

Authors

Piedachu Peris
View author publications
You can also search for this author in PubMed Google Scholar
Damián López
View author publications
You can also search for this author in PubMed Google Scholar
Marcelino Campos
View author publications
You can also search for this author in PubMed Google Scholar
José M. Sempere
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, 223-8522, Yokohama, Japan
Yasubumi Sakakibara
Dept. of Computer Science, Kyoto Sangyo University, Kamigamo Motoyama, Kita-ku, Kyoto, Japan
Satoshi Kobayashi
Japan Biological Informatics Consortium, 10F TIME24 Building, 2-45 Aomi, Koto-ku, 135-8073, Tokyo, Japan
Kengo Sato
Department of Information and Communication Engineering, Graduate School of Electro-Communications, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, 182-8585, Tokyo, Japan
Tetsuro Nishino
Department of Information and Communication Engineering, Faculty of Electro-Communications, The University of Electro-Communications, Chofugaoka 1–5–1, Chofu, 182-8585, Tokyo, Japan
Etsuji Tomita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peris, P., López, D., Campos, M., Sempere, J.M. (2006). Protein Motif Prediction by Grammatical Inference. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_15

Download citation

DOI: https://doi.org/10.1007/11872436_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Protein Motif Prediction by Grammatical Inference

Abstract

Access this chapter

Preview

Similar content being viewed by others

Learning the Language of Biological Sequences

Unsupervised Grammar Induction for Revealing the Internal Structure of Protein Sequence Motifs

GP-Based Grammatical Inference for Classification of Amyloidogenic Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Protein Motif Prediction by Grammatical Inference

Abstract

Access this chapter

Preview

Similar content being viewed by others

Learning the Language of Biological Sequences

Unsupervised Grammar Induction for Revealing the Internal Structure of Protein Sequence Motifs

GP-Based Grammatical Inference for Classification of Amyloidogenic Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation