Relational Sequence Alignments and Logos

Karwath, Andreas; Kersting, Kristian

doi:10.1007/978-3-540-73847-3_29

Relational Sequence Alignments and Logos

Andreas Karwath¹ &
Kristian Kersting¹

Conference paper

482 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4455))

Abstract

The need to measure sequence similarity arises in many applicitation domains and often coincides with sequence alignment: the more similar two sequences are, the better they can be aligned. Aligning sequences not only shows how similar sequences are, it also shows where there are differences and correspondences between the sequences.

Traditionally, the alignment has been considered for sequences of flat symbols only. Many real world sequences such as natural language sentences and protein secondary structures, however, exhibit rich internal structures. This is akin to the problem of dealing with structured examples studied in the field of inductive logic programming (ILP). In this paper, we introduce Real, which is a powerful, yet simple approach to align sequence of structured symbols using well-established ILP distance measures within traditional alignment methods. Although straight-forward, experiments on protein data and Medline abstracts show that this approach works well in practice, that the resulting alignments can indeed provide more information than flat ones, and that they are meaningful to experts when represented graphically.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barzilay, R., Lee, L.: Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. In: Proc. of HLT-NAACL-03, pp. 16–23 (2003)
Google Scholar
Brill, E.: Some advances in rule-based part of speech tagging. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) (1994)
Google Scholar
Cootes, A., Muggleton, S.H., Sternberg, M.J.E.: The automatic discovery of structural principles describing protein fold space. Journal of Molecular Biology 330(4), 839–850 (2003)
Article Google Scholar
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O (ed.) Atlas of Protein Sequence and Structure, vol. 5, ch. 22, pp. 345–352. Nat. Biomedical Research Foundation (1978)
Google Scholar
Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: Discriminative Training for Protein Sequence Alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 60–74. Springer, Heidelberg (2006)
Chapter Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchinson, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Gorodkin, J., Heyer, L.J., Brunak, S., Stormo, G.D.: Displaying the information contents of structural RNA alignments: the structure logos. CABIOS 13(6), 583–586 (1997)
Google Scholar
Gutmann, B., Kersting, K.: TildeCRF: Conditional Random Fields for Logical Sequence. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 174–185. Springer, Heidelberg (2006)
Chapter Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. 89, 10915–10919 (1992)
Article Google Scholar
Jacobs, N.: Relational Sequence Learning and User Modelling. PhD thesis, Computer Science Department, Katholieke Universiteit Leuven, Belgium (2004)
Google Scholar
Jiang, T., Wang, L., Zhang, K.: Alignment of trees: an alternative to tree edit. Theoretical Computer Science 143(1) (1995)
Google Scholar
Kersting, K., De Raedt, L., Raiko, T.: Logial Hidden Markov Models. Journal of Artificial Intelligence Research (JAIR) 25, 425–456 (2006)
MathSciNet Google Scholar
Kersting, K., Gärtner, T.: Fisher Kernels for Logical Sequences. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 205–216. Springer, Heidelberg (2004)
Google Scholar
Ketterlin, A.: Clustering Sequences of Complex Objects. In: Proc. of the 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), pp. 215–218 (1997)
Google Scholar
Lee, S.D., De Raedt, L.: Constraint Based Mining of First Order Sequences in SeqLog. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 154–173. Springer, Heidelberg (2004)
Google Scholar
Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Heidelberg (1989)
Google Scholar
McCallum, A., Bellare, K., Pereira, F.: A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. In: Bacchus, F., Jaakkola, T. (eds.) Proceedings of the Twenty-Firstst Conference on Uncertainty in Artificial Intelligence (UAI-05), Edinburgh, Scotland, July 26–29, 2005 (2005)
Google Scholar
Muggleton, S.H, De Raedt, L.: Inductive Logic Programming: Theory and Methods. Journal of Logic Programming 19(20), 629–679 (1994)
Article MathSciNet Google Scholar
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article Google Scholar
Nienhuys-Cheng, S.-H.: Distance between Herbrand interpretations: A measure for approximations to a target concept. In: Proc. of the 8. International Conference on Inductive Logic Programming (ILP-97), pp. 250–260 (1997)
Google Scholar
Parker, C., Fern, A., Tadepalli, P.: Gradient Boosting for Sequence Alignment. In: Gil, Y., Mooney, R.J. (eds.) Proceedings of National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, USA, July 16-20, 2006, AAAI Press, Stanford (2006)
Google Scholar
Ramon, J.: Clustering and instance based learning in first order logic. PhD thesis, Department of Computer Science, K.U. Leuven, Leuven, Belgium (October 2002)
Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Evol. Biol. 4(4), 406–425 (1987)
Google Scholar
Sato, K., Sakakribara, Y.: RNA secondary structural alignment with conditional random field. Bioinformatics 25(Suppl. 2), ii237–ii242 (2005)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Tobudic, A., Widmer, G.: Relational IBL in Classical Music. Machine Learning 2006 (to be published)
Google Scholar
Weskamp, N.: Graph Alignments: A New Concept to Detect Conserved Regions in Protein Active Sites. In: Giegerich, R., Stoye, J. (eds.) Proceedings German Conference on Bioinformatics, pp. 131–140 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Freiburg, Institute for Computer Science, Machine Learning Lab, Georges-Koehler-Allee, Building 079, 79110 Freiburg, Germany
Andreas Karwath & Kristian Kersting

Authors

Andreas Karwath
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Kersting
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stephen Muggleton Ramon Otero Alireza Tamaddoni-Nezhad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karwath, A., Kersting, K. (2007). Relational Sequence Alignments and Logos. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds) Inductive Logic Programming. ILP 2006. Lecture Notes in Computer Science(), vol 4455. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73847-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-73847-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73846-6
Online ISBN: 978-3-540-73847-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics