Summary
We present a Grammatical Swarm (GS) for the optimization of an aggregation operator. This combines the results of several classifiers into a unique score, producing an optimal ranking of the individuals. We apply our method to the identification of new members of a protein family. Support Vector Machine and Naive Bayes classifiers exploit complementary features to compute probability estimates. A great advantage of the GS is that it produces an understandable algorithm revealing the interest of the classifiers. Due to the large volume of candidate sequences, ranking quality is of crucial importance. Consequently, our fitness criterion is based on the Area Under the ROC Curve rather than on classification error rate. We discuss the performances obtained for a particular family, the cytokines and show that this technique is an efficient means of ranking the protein sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
O’Neill, M., Brabazon, A.: Grammatical swarm: The generation of programs by social programming. Natural Computing: an international journal 5(4), 443–462 (2006)
O’Neill, M., Ryan, C.: Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language. Kluwer Academic Publishers, Hingham (2003)
O’Neill, M., Adley, C., Brabazon, A.: A grammatical evolution approach to eukaryotic promoter recognition. In: Bioinformatics Inform Workshop and Symposium, Dublin, Ireland (2005)
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, Australia, vol. 4, pp. 1942–1948. IEEE Service Center, Piscataway (1995)
Handstad, T., Hestnes, A.J.H., Saetrom, P.: Motif kernel generated by genetic programming improves remote homology and fold detection. BMC Bioinformatics 8, 23 (2007) (Evaluation Studies)
Paris, G., Robilliard, D., Fonlupt, C.: Applying boosting techniques to genetic programming. In: Selected Papers from the 5th European Conference on Artificial Evolution, pp. 267–280. Springer, London (2002)
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genetic Programming and Evolvable Machines 2(4), 381–407 (2001)
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: ICML, pp. 848–855 (2003)
Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1998)
Guo, G., Li, S., Chan, K.: Face recognition by support vector machines (2000)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of ECML-98, 10th European Conference on Machine Learning, Chemnitz, pp. 137–142. Springer, Heidelberg (1998)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using suport vector machines. In: Proc. Natl. Acad. Sci., vol. 97, pp. 262–267 (2000)
Segal, N.H., Pavlidis, P., Antonescu, C.R., Maki, R.G., Noble, W.S., DeSantis, D., Woodruff, J.M., Lewis, J.J., Brennan, M.F., Houghton, A.N., Cordon-Cardo, C.: Classification and subtype prediction of adult soft tissue sarcoma by functional genomics. Am. J. Pathol. 163(2), 691–700 (2003)
Hua, S., Sun, Z.: A novel method of protein secondary structure prediction with high segment overlap measure: Svm approach (2001)
Saeys, Y., Degroeve, S., Aeyels, D., Rouze, P., Van de Peer, Y.: Feature selection for splice site prediction: a new method using EDA-based feature ranking. BMC Bioinformatics 5, 64 (2004) (Comparative Study)
Vert, J.: Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings (2002)
Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks (2001)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. In: Pac. Symp. Biocomput., pp. 564–575 (2002)
Gunn, S.: Support vector machines for classification and regression (1998)
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)
Zemla, A., Venclovas, C., Fidelis, K., Rost, B.: A modified definition of sov, a segment-based measure for protein secondary structure prediction assessment. Proteins 34(2), 220–223 (1999)
Conte, L., Ailey, L., Hubbard, B., Brenner, T., Murzin, S., Chothia, A.: Scop: a structural classification of proteins database (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ramstein, G., Beaume, N., Jacques, Y. (2009). Detection of Remote Protein Homologs Using Social Programming. In: Abraham, A., Hassanien, AE., de Carvalho, A.P.d.L.F. (eds) Foundations of Computational Intelligence Volume 4. Studies in Computational Intelligence, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01088-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-01088-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01087-3
Online ISBN: 978-3-642-01088-0
eBook Packages: EngineeringEngineering (R0)