Topological maps of protein sequences

Ferrán, E. A.; Ferrara, P.

doi:10.1007/BF00204658

Topological maps of protein sequences

Published: October 1991

Volume 65, pages 451–458, (1991)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

E. A. Ferrán¹ &
P. Ferrara¹

112 Accesses
Explore all metrics

Abstract

A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20×20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5%±7% of its amino acids and for fragments representing 7.5%±3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Altschul SF, Lipman DJ (1990) Protein database searches for multiple alignments. Proc Natl Acad Sci (USA) 87:5509–5513
Google Scholar
Andreassen H, Bohr H, Bohr J, Brunak S, Bugge T, Cotterill RMJ, Jacobsen C, Kusk P, Lautrop B, Petersen SB, Saermark T, Ulrich K (1990) Analysis of the secondary structure of the human immunodeficiency virus (HIV) proteins p17, gp120, and gp41 by computer modelling based on neural network methods. J Acquir Immune Defic Syndr 3:615–622
Google Scholar
Bengio Y, Pouliot Y (1990) Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network. Comput Appl Biosci 6:319–324
Google Scholar
Corpet F (1988) Multiple sequence alignment with hierarchical clustering. Nucl Acids Res 16:10881–10890
Google Scholar
Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucl Acids Res 12:387–395
Google Scholar
Ferrán EA, Ferrara P (1991) Clustering proteins into families using artificial neural networks. Comput Appl Biosci: (to be published)
Kohonen T (1982). Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
Google Scholar
Kohonen T (1988) Self-organization and associative memory, 2nd edn. Springer, Berlin Heidelberg New York
Google Scholar
Lapedes A, Barnes C, Burks C, Farber R, Sirotkin K (1990). Application of neural networks and other machine learning algorithms to DNA sequence analysis. In: Bell G, Marr T (eds) Computers and DNA. SFI Studies in the Sciences of Complexity, vol VII. Addison-Wesley, Reading Mass, pp 157–182
Google Scholar
Nakayama S, Shigezumi S, Yoshida M (1988) Method for clustering proteins by use of all possible pairs of amino acids as structural descriptors. J Chem Inf Comput Sci 28:72–78
Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Google Scholar
Quian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural networks models. J Mol Biol 202:865–884
Google Scholar
Rodrigues JS, Almeida LB (1991) Improving the convergence in Kohonen topological maps. In: Gelenbe E (ed) Neural networks: advances and applications. North-Holland, The Netherlands, pp 63–78
Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Google Scholar
Van Heel M (1990) A new family of powerful multivariate statistical sequence analysis (MSSA) techniques (submitted for publication)
Waterman MS, Arratia R, Galas DJ (1984). Pattern recognition in several sequences: consensus and alignement. Bull Math Biol 46:515–527
Google Scholar
Watson J (1990) The human genome project: past, present and future. Science 248:44–49
Google Scholar
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acids and protein data banks. Proc Natl Acad Sci (USA) 80:726–730
Google Scholar

Download references

Author information

Authors and Affiliations

Sanofi Elf Bio Recherches, Labège Innopole, BP 137, F-31328, Labège Cedex, France
E. A. Ferrán & P. Ferrara

Authors

E. A. Ferrán
View author publications
You can also search for this author inPubMed Google Scholar
P. Ferrara
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrán, E.A., Ferrara, P. Topological maps of protein sequences. Biol. Cybern. 65, 451–458 (1991). https://doi.org/10.1007/BF00204658

Download citation

Received: 24 January 1991
Accepted: 10 June 1991
Issue Date: October 1991
DOI: https://doi.org/10.1007/BF00204658

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topological maps of protein sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Network Analysis

Distribution of dipeptides in different protein structural classes: an effort to find new similarities

Pattern Recognition using the TTOCONROT

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Topological maps of protein sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Network Analysis

Distribution of dipeptides in different protein structural classes: an effort to find new similarities

Pattern Recognition using the TTOCONROT

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now