Summary
The era of data mining has provided renewed effort in the research of certain areas of biology that for their difficulty and lack of knowledge were and are still considered unsolved problems. One such problem, which is one of the fundamental open problems in computational biology is the prediction of the 3D structure of proteins, or protein structure prediction (PSP). The human experts, with the crucial help of data mining tools, are learning how protein fold to form their structure, but are still far from providing perfect models for all kinds of proteins. Data mining and knowledge discovery are totally necessary in order to advance in the understanding of the folding process. In this context, Learning Classifier Systems (LCS) are very competitive tools. They have shown in the past their competence in many different data mining tasks. Moreover, they provide human-readable solutions to the experts that can help them understand the PSP problem. In this chapter we describe our recent efforts in applying LCS to PSP related domains. Specifically, we focus in a relevant PSP subproblem, called Coordination Number (CN) prediction. CN is a kind of simplified profile of the 3D structure of a protein. Two kinds of experiments are described, the first of them analyzing different ways to represent the basic composition of proteins, its primary sequence, and the second one assessing different data sources and problem definition methods for performing competent CN prediction. In all the experiments LCS show their competence in terms of both accurate predictions and explanatory power.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Holland, J.H., Reitman, J.S.: Cognitive systems based on adaptive algorithms. In Hayes-Roth, D., Waterman, F., eds.: Pattern-Directed Inference Systems. Academic, New York (1978) 313–329
Smith, S.: A Learning System Based on Genetic Algorithms. PhD thesis, University of Pittsburgh, Pittsburgh (1980)
Bernadó, E., Llorà, X., Garrell, J.M.: XCS and GALE: a comparative study of two learning classifier systems with six other learning algorithms on classification tasks. In: Fourth International Workshop on Learning Classifier Systems – IWLCS-2001. (2001) 337–341
Bacardit, J., Butz, M.V.: Data mining in learning classifier systems: comparing xcs with gassist. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 282–290
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3 (1995) 149–175
Llorà, X., Garrell, J.M.: Knowledge-independent data mining with fine-grained parallel evolutionary algorithms. In: Proceedings of the Third Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (2001) 461–468
Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining Era: Representations, Generalization, and Run-Time. PhD thesis, Ramon Llull University, Barcelona, Spain (2004)
Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J.: From HP lattice models to real proteins: coordination number prediction using learning classifier systems. In: Applications of Evolutionary Computing, EvoWorkshops 2006, Springer, Berlin Heidelberg New York, LNCS 3907 (2006) 208–220
Bacardit, J., Stout, M., Krasnogor, N., Hirst, J.D., Blazewicz, J.: Coordination number prediction using learning classifier systems: performance and interpretability. In: GECCO’06: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2006) 247–254
Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N.: Prediction of residue exposure and contact number for simplified hp lattice model proteins using learning classifier systems. In Ruan, D., D’hondt, P., Fantoni, P.F., Cock, M.D., Nachtegael, M., Kerre, E.E., eds.: Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence. World Scientific, Genova (2006) 601–608
Hinds, D.A., Levitt, M.: A lattice model for protein-structure prediction at low resolution. Proceedings of the National Academy Sciences of the United States of America 89 (1992) 2536–2540
Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proceedings of the National Academy Sciences of the United States of America 92 (1995) 325–329
Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58 (2005) 158–165
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4 (2003) 575–602
Shao, Y., Bystroff, C.: Predicting interresidue contacts using templates and pathways. Proteins 53 (2003) 497–502
MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20 (2004) I224–I231
Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering (2003) 26–36
Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389–3402
Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. Part 2. Application to two-dimensional lattice proteins. Biopolymers 20 (1981) 1013–1031
Hart, W.E., Istrail, S.: Crystallographical universal approximability: a complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95-1294, Sandia National Labs, Albuquerque (1995)
Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. Journal of Computational Biology (1997) 1–20
Escuela, G., Ochoa, G., Krasnogor, N.: Evolving l-systems to capture protein structure native conformations. In: Proceedings of the 8th European Conference on Genetic Programming (EuroGP 2005), Lecture Notes in Computer Sciences 3447, pp. 73–84, Springer, Berlin Heidelberg New York (2005)
Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzy-evolutionary hybrid. In Verdegay, J., ed.: Fuzzy Sets based Heuristics for Optimization. Springer, Berlin Heidelberg New York (2002)
Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R., eds.: GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann (1999)
Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Proceedings of the Parallel Problem Solving from Nature VII. Lecture Notes in Computer Science. Volume 2439 (2002) 769–778
Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In Bentley, P., ed.: AID98: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)
Krasnogor, N., Pelta, D., Marcos, D., Risi, W.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)
DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13 (1993) 161–188
Bacardit, J.: Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: Proceedings of the Genetic and Evolutionary Computation Conference. Volume 2., ACM Press, New York (2005) 1843–1850
Rissanen, J.: Modeling by shortest data description. Automatica 14 (1978) 465–471
Bacardit, J., Goldberg, D., Butz, M., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Parallel Problem Solving from Nature - PPSN 2004, Springer, Berlin Heidelberg New York, LNCS 3242 (2004) 1021–1031
Breiman, L.: Bagging predictors. Machine Learning 24 (1996) 123–140
Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In: Proceedings of the 9th International Workshop on Learning Classifier Systems. (to appear), LNAI, Springer (2007)
Blake, C., Keogh, E., Merz, C.: UCI repository of machine learning databases (1998) (www.ics.uci.edu/mlearn/MLRepository.html)
Liu, H., Hussain, F., Tam, C.L., Dash, M.: Discretization: an enabling technique. Data Mining and Knowledge Discovery 6 (2002) 393–423
Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Research 29 (2001) 219–220
Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9 (1991) 56–68
Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. Journal of Molecular Biology 296 (2000) 961–968
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, San Mateo (1995) 338–345
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (2000)
Miller, R.G.: Simultaneous Statistical Inference. Springer, Berlin Heidelberg New York (1981)
Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292 (1999) 195–202
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University. (2001) Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Booker, L.: Recombination distribution for genetic algorithms. In: Foundations of Genetic Algorithms 2. Morgan Kaufmann (1993) 29–44
Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Computer Applications in the Biosciences 9 (1993) 745–756
Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2007) to appear
Harik, G.: Linkage learning via probabilistic modeling in the ecga. Technical Report 99010, Illinois Genetic Algorithms Lab, University of Illinois at Urbana-Champaign (1999)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Bacardit, J., Krasnogor, N.: Biohel: bioinformatics-oriented hierarchical evolutionary learning. Nottingham eprints, University of Nottingham (2006)
Venturini, G.: Sia: a supervised inductive algorithm with genetic search for learning attributes based concepts. In Brazdil, P.B., ed.: Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning. Springer, Berlin Heidelberg New York (1993) 280–296
Stout, M., Bacardit, J., Hirst, J.D., Smith, R.E., Krasnogor, N.: Prediction of topological contacts in proteins using learning classifier systems. Soft Computing (2007) Special Issue on Evolutionary and Metaheuristic–based Data Mining (EMBDM), to appear
Preparata, F.P.: Computational geometry : an introduction/Franco P. Preparata, Michael Ian Shamos. Texts and monographs in computer science. Springer (1985)
Butz, M.V., Lanzi, P.L., Wilson, S.W.: Hyper-ellipsoidal conditions in xcs: rotation, linear approximation, and solution structure. In: GECCO’06: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM Press, New York (2006) 1457–1464
O’Hara, T., Bull, L.: Backpropagation in accuracy-based neural learning classifier systems. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 25–39
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bacardit, J., Stout, M., Hirst, J.D., Krasnogor, N. (2008). Data Mining in Proteomics with Learning Classifier Systems. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-78979-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78978-9
Online ISBN: 978-3-540-78979-6
eBook Packages: EngineeringEngineering (R0)