Abstract
In this paper, we illustrate a system aimed at predicting protein secondary structures. Our proposal falls in the category of multiple experts, a machine learning technique that –under the assumption of absent or negative correlation in experts’ errors– may outperform monolithic classifier systems. The prediction activity results from the interaction of a population of experts, each integrating genetic and neural technologies. Roughly speaking, an expert of this kind embodies a genetic classifier designed to control the activation of a feedforward artificial neural network. Genetic and neural components (i.e., guard and embedded predictor, respectively) are devoted to perform different tasks and are supplied with different information: Each guard is aimed at (soft-) partitioning the input space, insomuch assuring both the diversity and the specialization of the corresponding embedded predictor, which in turn is devoted to perform the actual prediction. Guards deal with inputs that encode information strictly related with relevant domain knowledge, whereas embedded predictors process other relevant inputs, each consisting of a limited window of residues. To investigate the performance of the proposed approach, a system has been implemented and tested on the RS126 set of proteins. Experimental results point to the potential of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Anfinsen, C.B.: Principles that govern the folding of protein chains. Science 181, 223–230 (1973)
Armano, G.: NXCS Experts for Financial Time Series Forecasting. In: Bull, L. (ed.) Applications of Learning Classifier Systems, pp. 68–91. Springer, Heidelberg (2004)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)
Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the Past and the Future in Protein Secondary Structure Prediction. Bioinformatics 15, 937–946 (1999)
Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., Soda, G.: Bidirectional Dynamics for Protein Secondary Structure Prediction. In: Sun, R., Giles, C.L. (eds.) Sequence Learning: Paradigms, Algorithms, and Applications, pp. 80–104. Springer, Heidelberg (2000)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research. 28, 235–242 (2000)
Blundell, T.L., Johnson, M.S.: Catching a common fold. Prot. Sci. 2(6), 877–883 (1993)
Boczko, E.M., Brooks, C.L.: First-principles calculation of the folding free energy of a three-helix bundle protein. Science 269(5222), 393–396 (1995)
Bowie, J.U., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known 3-dimensional structure. Science 253, 164–170 (1991)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Breiman, L.: Stacked Regressions. Machine Learning 24, 41–48 (1996)
Cleeremans, A.: Mechanisms of Implicit Learning. In: Connectionist Models of Sequence Processing. MIT Press, Cambridge (1993)
Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826 (1986)
Chothia, C.: One thousand families for the molecular biologist. Nature 357, 543–544 (1992)
Chou, P.Y., Fasman, U.D.: Prediction of protein conformation. Biochem. 13, 211–215 (1974)
Chothia, C.: Proteins – 1000 families for the molecular biologist. Nature 357, 543–544 (1992)
Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3(4), 261–283 (1989)
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. PROTEINS: Structure, Function and Genetics 34, 508–519 (1999)
Dandekar, T., Argos., P.: Folding the main chain of small proteins with the genetic algorithm. J. Mol. Biol. 236, 844–861 (1994)
Covell, D.G.: Folding protein alpha-carbon chains into compact forms by Monte Carlo methods. Proteins 14, 409–420 (1992)
Flockner, H., Braxenthaler, M., Lackner, P., Jaritz, M., Ortner, M., Sippl, M.J.: Progress in fold recognition. Proteins: Struct., Funct., Genet. 23, 376–386 (1995)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer Science and System Sciences 55(1), 119–139 (1997)
Gething, M.J., Sambrook, J.: Protein folding in the cell. Nature 355, 33–45 (1992)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)
Greer, J.: Comparative modelling methods: application to the family of the mammalian serine proteases. Proteins 7, 317–334 (1990)
Havel, T.F.: Predicting the structure of the flavodoxin from Eschericia coli by homology modeling, distance geometry and molecular dynamics. Mol. Simulation 10, 175–210 (1993)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci., 10915–10919 (1989)
Holley, H.L., Karplus, M.: Protein secondary structure prediction with a neural network. Proc. Natl. Acad. Sc., U.S.A. 86, 152–156 (1989)
Hartl, F.U.: Secrets of a double-doughnut. Nature 371, 557–559 (1994)
Higgins, D., Thompson, J., Gibson, T., Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Hinds, D.A., Levitt, M.: Exploring conformational space with a simple lattice model for protein structure. J. Mol. Biol. 243, 668–682 (1994)
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)
Holland, J.H.: Adaption. In: Rosen, R., Snell, F.M. (eds.) Progress in Theoretical Biology, vol. 4, pp. 263–293. Academic Press, New York (1976)
Holland, J.H.: Escaping Brittleness: The possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Systems. In: Michalski, R.S., Carbonell, J., Mitchell, M. (eds.) Machine Learning, An Artificial Intelligence Approach, vol. II 20, pp. 593–623. Morgan Kaufmann, San Francisco (1986)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive Mixtures of Local Experts. Neural Computation 3, 79–87 (1991)
Jones, D.T., Taylor, W.R., Thornton, J.M.: A new approach to protein fold recognition. Nature 358, 86–89 (1992)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Jordan, M.I., Jacobs, R.A.: Hierarchies of Adaptive Experts. In: Moody, J., Hanson, S., Lippman, R. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 985–993. Morgan Kaufmann, San Francisco (1992)
Kanehisa, M.: A multivariate analysis method for discriminating protein secondary structural segments. Prot. Engin. 2, 87–92 (1988)
Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. MIT Press, Cambridge (1995)
Lathrop, R.H., Smith, T.F.: Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665 (1996)
Levitt, M.: Protein folding by constrained energy minimization and molecular dynamics. J. Mol. Biol. 170, 723–764 (1983)
Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107 (1976)
Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins: Struct., Funct., Genet. 23, 356–369 (1995)
Mitchell, E.M., Artymiuk, P.J., Rice, D.W., Willett, P.: Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J. Mol. Biol. 212, 151–166 (1992)
Orengo, C.A., Jones, D.T., Thornton, J.M.: Protein superfamilies and domain superfolds. Nature 372, 631–634 (1994)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Ptitsyn, O.B., Finkelstein, A.V.: Theory of protein secondary structure and algorithm of its prediction. Biopolymers 22, 15–25 (1983)
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Neural Networks and Profiles. Proteins 47, 228–235 (2002)
Riis, S.K., Krogh, A.: Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J. Comp. Biol. 3, 163–183 (1996)
Rivest, R.L.: Learning Decision Lists. Machine Learning 2(3), 229–246 (1987)
Robson, B.: Conformational properties of amino acid residues in globular proteins. J. Mol. Biol. 107, 327–356 (1976)
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)
Roterman, I.K., Lambert, M.H., Gibson, K.D., Scheraga, H.A.: A comparison of the charmm, amber and ecepp potentials for peptides. ii. phi-psi maps for n-acetyl alanine n’-methyl amide: comparisons, contrasts and simple experimental tests. J. Biomol. Struct. Dynamics 7, 421–453 (1989)
Russell, R.B., Copley, R.R., Barton, G.J.: Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349–365 (1996)
Sali, A.: Modelling mutations and homologous proteins. Curr. Opin. Biotech. 6, 437–451 (1995)
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignment. J. Mol. Biol. 247, 11–15 (1995)
Sanchez, R., Sali, A.: Advances in comparative protein-structure modeling. Curr. Opin. Struct. Biol. 7, 206–214 (1997)
Schapire, E.: A Brief Introduction to Boosting. In: Proc. of the Sixteenth Int. Joint Conference on Artificial Intelligence, pp. 1401–1406 (1999)
Skolnick, J., Kolinski, A.: Simulations of the folding of a globular protein. Science 250, 1121–1125 (1990)
Sun, R., Peterson, T.: Multi-agent reinforcement learning: weighting and partitioning. Neural Networks 12(4-5), 127–153 (1999)
Taylor, W.R., Thornton, J.M.: Prediction of super-secondary structure in proteins. Nature 301, 540–542 (1983)
Taylor, W.R., Orengo, C.A.: Protein-structure alignment. J. Mol. Biol. 208, 1–22 (1989)
Unger, R., Harel, D., Wherland, S., Sussman, J.L.: A 3-D building blocks approach to analyzing and predicting structure of proteins. Proteins 5, 355–373 (1989)
Vajda, S., Sippl, M., Novotny, J.: Empirical potentials and functions for protein folding and binding. Curr. Opin. Struct. Biol. 7, 228–228 (1997)
Valiant, L.: A Theory of the Learnable. Communications of the ACM 27, 1134–1142 (1984)
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., New York (1998)
Vere, S.A.: Multilevel Counterfactuals for Generalizations of Relational Concepts and Productions. Artificial Intelligence 14(2), 139–164 (1980)
Weigend, A.S., Mangeas, M., Srivastava, A.N.: Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting. Int. Journal of Neural Systems 6, 373–399 (1995)
Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149–175 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Armano, G., Mancosu, G., Orro, A., Vargiu, E. (2005). A Multi-agent System for Protein Secondary Structure Prediction. In: Priami, C., Merelli, E., Gonzalez, P., Omicini, A. (eds) Transactions on Computational Systems Biology III. Lecture Notes in Computer Science(), vol 3737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11599128_2
Download citation
DOI: https://doi.org/10.1007/11599128_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30883-6
Online ISBN: 978-3-540-31446-2
eBook Packages: Computer ScienceComputer Science (R0)