Abstract
Life sciences are continuously producing large amounts of complex data that require relational learning to facilitate knowledge discovery. Inductive Logic Programming (ILP) is a powerful method which allows expressive representation of the data and produces explicit knowledge. However, ILP systems return variable theories depending on heuristic user-choices of various parameters and may miss potentially relevant rules. Accordingly, we propose an original approach based on post-ILP propositionalization of the examples and Formal Concept Analysis for effective interpretation of reached rules with the possibility of adding domain knowledge. Our approach is applied to the characterization of three-dimensional (3D) protein-binding sites which are protein portions on which interactions with other proteins take place. We define a relational representation of protein 3D patches and formalize the problem as a concept learning problem using ILP. We report here the results we obtained on particular protein-binding sites namely phosphorylation sites using ILP followed by FCA-based interpretation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
De Raedt L.: Logical and Relational Learning. Springer (2008)
Smith, G., Sternberg, M.: Prediction of protein-protein interactions by docking methods. Current Opinion in Structural Biology 12(1), 28–35 (2002)
Aloy, P., Russell, R.: InterPreTS: Protein Interaction Prediction through Tertiary Structure. Bioinformatics Applications Note 19(1), 161–162 (2003)
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)
Tran, T.N., Satou, K., Ho, T.B.: Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 321–330. Springer, Heidelberg (2005)
Jones, S., Thornton, J.: Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 272, 121–132 (1997)
Zhu, H., Domingues, F.S., Sommer, I., Lengauer, T.: NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics 7, 27 (2006)
Muggleton, S.: Inductive Logic Programming. New Generation Computing 8(4), 295–318 (1991)
Muggleton, S., De Raedt, L.: Inductive Logic Programming: Theory And Methods. Journal of Logic Programming 19(20), 629–679 (1994)
Page, D., Srinivasan, A.: ILP: A Short Look Back and a Longer Look Forward. Journal of Machine Learning Research 4, 415–430 (2003)
King, R.: Logic, Automation, and the Future of Biology. In: Proceedings of the Spring School on Modelling Complex Biological Systems, Sophia-Antipolis, France (2011)
Ganter, B., Wille, R.: Formal concept analysis: Mathematical foundations. Springer, Heidelberg (1999)
Guharoy, M., Chakrabarti, P.: Conservation and relative importance of residues across protein-protein interfaces. PNAS 102(43), 15447–15452 (2005)
Diella, F., Gould, C.M., Chica, C., Via, A., Gibson, T.J.: Phospho.ELM: a database of phosphorylation sites. Nucleic Acids Res. 36(Database issue), D240-D244 (2008)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K.: Prediction of protein subcellular localization. Proteins 64, 643–651 (2006)
Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I., Kim, S.-H.: Recognition of a protein fold in the context of the SCOP classification. Proteins: Structure, Function, and Genetics 35(4), 401–407 (1999)
Srinivasan, A.: The Aleph Manual (2007), http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/
Szathmary, L.: Symbolic Data Mining Methods with the Coron Platform. PhD Thesis in Computer Science, Univ. Henri Poincaré – Nancy 1, France (2006)
Wong, Y., et al.: Kinasephos 2.0: A Web Server For Identifying Protein Kinase-Specific Phosphorylation Sites Based on Sequences and Coupling Patterns. Nucleic Acids Res. 35(Web Server issue), W588–W594 (2007)
Durek, P., Schudoma, C., Weckwerth, W., Selbig, J., Walther, D.: Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins. BMC Bioinformatics 10, 117 (2009)
Finn, P., Muggleton, S., Page, D., Srinivasan, A.: Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL. Machine Learning 30(2-3), 241–273 (1998)
Punta, M., et al.: The Pfam protein families database. Nucleic Acids Research 40(Database Issue), D290–D301 (2012)
Obata, T., Yaffe, M.B., Leparc, G.G., Piro, E.T., Maegawa, H., Kashiwagi, A., Kikkawa, R., Cantley, L.C.: Peptide and protein library screening defines optimal substrate motifs for AKT/PKB. J. Biol. Chem. 275, 36108–36115 (2000)
Page, D., Craven, M.: Biological applications of multi-relational data mining. SIGKDD Explorations 5(1), 69–79 (2003)
Tsunoyama, K., Ata Amini, A., Sternberg, M., Muggleton, S.: Scaffold Hopping in Drug Discovery Using Inductive Logic Programming. Journal of Chemical Information and Modeling 48(5), 949–957 (2008)
Turcotte, M., Muggleton, S., Sternberg, M.: Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology 306(3), 591–605 (2001)
Dzeroski, S., Lavrac, N.: Relational Data Mining. Springer (2001)
Santos, J., Nassif, H., Page, D., Muggleton, S., Sternberg, M.: Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study. BMC Bioinformatics 13, 162 (2012)
Kramer, S., Lavrac, N., Flach, P.: Propositionalization Approaches to Relational data Mining. In: Dzeroski, S., Lavrac, N. (eds.) Relational Data Mining. Springer (2001)
Berthold, M.R., Morik, K., Siebes, A. (eds.): Parallel universes and local patterns. Dagstuhl Seminar No. 07181 (2007)
Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From Local Patterns to Global Models: The LeGo Approach to Data Mining. In: Proc. of the Int. Workshop From Local Patterns to Global Models co-located with ECML/PKDD 2008, Antwerp, Belgium, pp. 1–16 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bresso, E., Grisoni, R., Devignes, MD., Napoli, A., Smail-Tabbone, M. (2013). ILP Characterization of 3D Protein-Binding Sites and FCA-Based Interpretation. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2012. Communications in Computer and Information Science, vol 415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54105-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-54105-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54104-9
Online ISBN: 978-3-642-54105-6
eBook Packages: Computer ScienceComputer Science (R0)