Skip to main content
Log in

Topological maps of protein sequences

  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20×20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5%±7% of its amino acids and for fragments representing 7.5%±3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altschul SF, Lipman DJ (1990) Protein database searches for multiple alignments. Proc Natl Acad Sci (USA) 87:5509–5513

    Google Scholar 

  • Andreassen H, Bohr H, Bohr J, Brunak S, Bugge T, Cotterill RMJ, Jacobsen C, Kusk P, Lautrop B, Petersen SB, Saermark T, Ulrich K (1990) Analysis of the secondary structure of the human immunodeficiency virus (HIV) proteins p17, gp120, and gp41 by computer modelling based on neural network methods. J Acquir Immune Defic Syndr 3:615–622

    Google Scholar 

  • Bengio Y, Pouliot Y (1990) Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network. Comput Appl Biosci 6:319–324

    Google Scholar 

  • Corpet F (1988) Multiple sequence alignment with hierarchical clustering. Nucl Acids Res 16:10881–10890

    Google Scholar 

  • Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucl Acids Res 12:387–395

    Google Scholar 

  • Ferrán EA, Ferrara P (1991) Clustering proteins into families using artificial neural networks. Comput Appl Biosci: (to be published)

  • Kohonen T (1982). Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69

    Google Scholar 

  • Kohonen T (1988) Self-organization and associative memory, 2nd edn. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Lapedes A, Barnes C, Burks C, Farber R, Sirotkin K (1990). Application of neural networks and other machine learning algorithms to DNA sequence analysis. In: Bell G, Marr T (eds) Computers and DNA. SFI Studies in the Sciences of Complexity, vol VII. Addison-Wesley, Reading Mass, pp 157–182

    Google Scholar 

  • Nakayama S, Shigezumi S, Yoshida M (1988) Method for clustering proteins by use of all possible pairs of amino acids as structural descriptors. J Chem Inf Comput Sci 28:72–78

    Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Google Scholar 

  • Quian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural networks models. J Mol Biol 202:865–884

    Google Scholar 

  • Rodrigues JS, Almeida LB (1991) Improving the convergence in Kohonen topological maps. In: Gelenbe E (ed) Neural networks: advances and applications. North-Holland, The Netherlands, pp 63–78

    Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Google Scholar 

  • Van Heel M (1990) A new family of powerful multivariate statistical sequence analysis (MSSA) techniques (submitted for publication)

  • Waterman MS, Arratia R, Galas DJ (1984). Pattern recognition in several sequences: consensus and alignement. Bull Math Biol 46:515–527

    Google Scholar 

  • Watson J (1990) The human genome project: past, present and future. Science 248:44–49

    Google Scholar 

  • Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acids and protein data banks. Proc Natl Acad Sci (USA) 80:726–730

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrán, E.A., Ferrara, P. Topological maps of protein sequences. Biol. Cybern. 65, 451–458 (1991). https://doi.org/10.1007/BF00204658

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00204658

Keywords

Navigation